Creating and Exporting Probability Distributions for the Probabilistic SEIR Models
Begin with library imports and a few various settings, as well as determining the overall population size for the probability distributions and SEIR models.
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import math
import numpy as np
import random
import seaborn as sns
import pandas as pd
from scipy import stats
from scipy.stats import beta
import ipywidgets as widgets
from IPython.display import display
import pickle
sns.set()
plt.rcParams["figure.figsize"] = (20,10)
%matplotlib notebook
N = 10_000 # population size
PERT Probability Distribution Creation Function
This function creates and returns a PERT beta distribution given expected mode/peak, min, and max values, as well as a sample size. Here we use the population size set above for all our created probability distributions. The shape of the PERT distribution can be altered with the temp value (lower value results in a flatter distribution - with 0 being a uniform distribution), which is defaulted to 4 if no value is provided.
def pert_dist(peak, low, high, sample_size=N, temp=4):
pert = []
pert_loc = low
pert_scale = high - low
con_1 = 1 + temp * (peak - low) / (high - low)
con_0 = 1 + temp * (high - peak) / (high - low)
pert = np.random.beta(con_1, con_0, sample_size)
pert = pert_loc + pert_scale * pert
return pert
Finding best fit PERT/beta distributions using the find_beta function
Many - if not most - of the data we are working with for our models has estimated means and confidence intervals for low/high values. Wanting to use this information to create best guess distributions, we use the find_beta function created below to find a best fit PERT/beta distribution given the known estimates. Inputs to the function are the mean and low/high CI values.
def find_beta(exp_mu, exp_low, exp_high, exp_lda=4):
mu_range = np.linspace(exp_low, exp_high, 40)
low_range = np.linspace(0, exp_mu, 40)
high_range = np.linspace(exp_mu, (exp_high * 2), 40)
test_res, mu_res, low_res, high_res, mean_err_rec, exp_mu_rec = [], [], [], [], [], []
for mu in mu_range:
for low in low_range:
for high in high_range:
if low >= mu:
low = mu * .8
if high <= mu:
high = mu * 1.2
test_dist = pert_dist(mu, low, high, 1000, exp_lda)
test_mean = np.mean(test_dist)
test_low = np.percentile(test_dist, 10)
test_high = np.percentile(test_dist, 90)
mean_err = (test_mean - exp_mu)**2
low_err = (test_low - exp_low)**2
high_err = (test_high - exp_high)**2
test_res.append(mean_err + np.sqrt(mean_err + low_err + high_err))
mu_res.append(mu)
low_res.append(low)
high_res.append(high)
mean_err_rec.append(np.sqrt(mean_err))
exp_mu_rec.append(exp_mu)
full_results = pd.DataFrame({
'Test Results': pd.Series(test_res),
'Mode Value': pd.Series(mu_res),
'Low Value': pd.Series(low_res),
'High Value': pd.Series(high_res),
'mean_err': pd.Series(mean_err_rec),
'expected mu': pd.Series(exp_mu_rec)
})
return full_results
The df_dist class
The df_dist class is created by providing the expected mean, low/high CI, distribution size, beta temperature (see above to explanation of beta temps), and a name. Only mean and low/high values are required, all others have a default. The class is created by finding a best fit PERT distribution that is then available as an attribute (df_dist.pert). The df_dist class also has a checkParams function that both provides an information display about desired and resulting mean/low/high CI values, as well as a histogram plot of the resulting probability distribution. Finally, the df_dist class has a getSample() method for returning a random sample from the instance’s PERT/beta distribution.
class df_dist:
def __init__(self, dist_mu, dist_low, dist_high, dist_size=10_000, dist_temp=4, name='Unknown'):
self.mu = dist_mu
self.low = dist_low
self.high = dist_high
self.temp = dist_temp
self.dsize = dist_size
self.name = name
self.df = find_beta(self.mu, self.low, self.high, self.temp)
self.df.sort_values(by=['Test Results'], inplace=True)
self.new_mu, self.new_low, self.new_high = list(self.df.iloc[0, 1:4])
self.pert = pert_dist(self.new_mu, self.new_low, self.new_high, self.dsize, self.temp)
def checkParams(self):
print(f'Pert Variables: Mode: {self.new_mu}, Low: {self.new_low}, High: {self.new_high}')
fig, ax = plt.subplots(1, figsize=(8,5))
ax.hist(self.pert, 25)
ax.set_title(f'{self.name} Probability Distribution Histogram')
ax.set_xlabel('$value$')
ax.set_ylabel('$count$')
plt.show()
print(f'Target Values: Mean={self.mu}, Low={self.low}, High={self.high}')
print(f'Dist. Values: Mean={np.mean(self.pert)}, Low={np.percentile(self.pert, 10)}, High={np.percentile(self.pert, 90)}')
def getSample(self):
return random.sample(list(self.pert), 1)[0]
Creating the probability distributions for each of the required variables for a probabilistic SEIR model
# length of incubation period
# Defaults: mean 5.1, min CI value of 1.9, max CI value of 15.6
e2i = df_dist(5.1, 2.5, 11.5, N, 4, 'Incubation Period')
# spread mostly likely during infectious period, unlikely once recovery begins: https://www.statnews.com/2020/03/09/people-shed-high-levels-of-coronavirus-study-finds-but-most-are-likely-not-infectious-after-recovery-begins/
# infectious period, time to begining recovery after exposure to infectious transition
# Defaults:
i2r = df_dist(7, 4, 12, N, 4, 'Infectious Period')
# (.009 default) case fatality rate, https://wwwnc.cdc.gov/eid/article/26/6/20-0320_article
# Defaults: mean .009, min CFR (.0025), max CFR (.03 default)
cfr = df_dist(.015, .0025, .03, name='Case Fatality Rate')
# H1N1 ratio for confirmed cases to estimated total infected
c2i = df_dist((1/79), (1/148), (1/47), name='Confirmed to Infectious Ratio')
# H1N1 ratio for hospitalized to infected
h2i = df_dist(.0045, .0016, .012, name='Hospitalized to Infectious Ratio')
# person-to-person spread: primary means of propagation https://www.cdc.gov/coronavirus/2019-ncov/prepare/transmission.html?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fcoronavirus%2F2019-ncov%2Fabout%2Ftransmission.html
freq_cont = df_dist(4, 0, 6, name='Daily Frequent Contacts')
infreq_cont = df_dist(3, 0, 5, name='Daily Infrequent Contacts')
Exporting the distributions for later import and usage
Exporting the above distribution class objects for later use in a modified application in order to save time and computational energy by avoiding rerunning the intense distribution finding algorithms of the distribution classes.
filename = 'e2i_object'
outfile = open(filename, 'wb')
pickle.dump(e2i, outfile)
outfile.close()
filename = 'i2r_object'
outfile = open(filename, 'wb')
pickle.dump(i2r, outfile)
outfile.close()
filename = 'c2i_object'
outfile = open(filename, 'wb')
pickle.dump(c2i, outfile)
outfile.close()
filename = 'h2i_object'
outfile = open(filename, 'wb')
pickle.dump(h2i, outfile)
outfile.close()
filename = 'freq_cont_object'
outfile = open(filename, 'wb')
pickle.dump(freq_cont, outfile)
outfile.close()
filename = 'infreq_cont_object'
outfile = open(filename, 'wb')
pickle.dump(infreq_cont, outfile)
outfile.close()
filename = 'cfr_object'
outfile = open(filename, 'wb')
pickle.dump(cfr, outfile)
outfile.close()
Example of the checkParams() method and associated output
e2i.checkParams()
Pert vars: mode: 2.5, low: 0.6538461538461537, high: 23.0
<IPython.core.display.Javascript object>
Original inputs: Mean=5.1, Low=2.5, High=11.5
New outputs: Mean=5.64894409721856, Low=1.681319405861826, High=10.66456232799803
Example of the results output DataFrame calculated by the find_beta() method
e2i.df.head()
Test Results | Mode Value | Low Value | High Value | mean_err | expected mu | |
---|---|---|---|---|---|---|
239 | 1.315555 | 2.500000 | 0.653846 | 23.000000 | 0.496877 | 5.1 |
318 | 1.316557 | 2.500000 | 0.915385 | 22.541026 | 0.543565 | 5.1 |
1958 | 1.317188 | 2.730769 | 1.046154 | 22.541026 | 0.543493 | 5.1 |
357 | 1.411159 | 2.500000 | 1.046154 | 22.082051 | 0.481530 | 5.1 |
1715 | 1.461687 | 2.730769 | 0.261538 | 21.164103 | 0.329377 | 5.1 |
The df_dist class objects can return instance attribute information and return distribution samples
e2i.name
'Incubation Period'
print(f'{freq_cont.name} distribution size: {freq_cont.dsize}')
Daily Frequent Contacts distribution size: 10000
cfr.getSample()
0.010151197124524818
for i in range(5):
print(i2r.getSample())
5.0101533539017336
9.556873291853238
3.6950106403153087
6.661928818757324
6.7697746510448935