Creating and Exporting Probability Distributions for the Probabilistic SEIR Models

Begin with library imports and a few various settings, as well as determining the overall population size for the probability distributions and SEIR models.

import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import math
import numpy as np
import random
import seaborn as sns
import pandas as pd
from scipy import stats
from scipy.stats import beta
import ipywidgets as widgets
from IPython.display import display
import pickle

sns.set()
plt.rcParams["figure.figsize"] = (20,10)

%matplotlib notebook

N = 10_000 # population size

PERT Probability Distribution Creation Function

This function creates and returns a PERT beta distribution given expected mode/peak, min, and max values, as well as a sample size. Here we use the population size set above for all our created probability distributions. The shape of the PERT distribution can be altered with the temp value (lower value results in a flatter distribution - with 0 being a uniform distribution), which is defaulted to 4 if no value is provided.

def pert_dist(peak, low, high, sample_size=N, temp=4):
    
    pert = []
    pert_loc = low
    pert_scale = high - low
    
    con_1 = 1 + temp * (peak - low) / (high - low)
    con_0 = 1 + temp * (high - peak) / (high - low)
    
    pert = np.random.beta(con_1, con_0, sample_size)
    pert = pert_loc + pert_scale * pert
    return pert

Finding best fit PERT/beta distributions using the find_beta function

Many - if not most - of the data we are working with for our models has estimated means and confidence intervals for low/high values. Wanting to use this information to create best guess distributions, we use the find_beta function created below to find a best fit PERT/beta distribution given the known estimates. Inputs to the function are the mean and low/high CI values.

def find_beta(exp_mu, exp_low, exp_high, exp_lda=4):
    mu_range = np.linspace(exp_low, exp_high, 40)
    low_range = np.linspace(0, exp_mu, 40)
    high_range = np.linspace(exp_mu, (exp_high * 2), 40)
    test_res, mu_res, low_res, high_res, mean_err_rec, exp_mu_rec = [], [], [], [], [], []
    
    for mu in mu_range:
        for low in low_range:
            for high in high_range:
                if low >= mu:
                    low = mu * .8
                if high <= mu:
                    high = mu * 1.2
                test_dist = pert_dist(mu, low, high, 1000, exp_lda)
                test_mean = np.mean(test_dist)
                test_low = np.percentile(test_dist, 10)
                test_high = np.percentile(test_dist, 90)
                mean_err = (test_mean - exp_mu)**2
                low_err = (test_low - exp_low)**2
                high_err = (test_high - exp_high)**2
                test_res.append(mean_err + np.sqrt(mean_err + low_err + high_err))
                mu_res.append(mu)
                low_res.append(low)
                high_res.append(high)
                mean_err_rec.append(np.sqrt(mean_err))
                exp_mu_rec.append(exp_mu)
                
    full_results = pd.DataFrame({
        'Test Results': pd.Series(test_res),
        'Mode Value': pd.Series(mu_res),
        'Low Value': pd.Series(low_res),
        'High Value': pd.Series(high_res),
        'mean_err': pd.Series(mean_err_rec),
        'expected mu': pd.Series(exp_mu_rec)
    })
    return full_results

The df_dist class

The df_dist class is created by providing the expected mean, low/high CI, distribution size, beta temperature (see above to explanation of beta temps), and a name. Only mean and low/high values are required, all others have a default. The class is created by finding a best fit PERT distribution that is then available as an attribute (df_dist.pert). The df_dist class also has a checkParams function that both provides an information display about desired and resulting mean/low/high CI values, as well as a histogram plot of the resulting probability distribution. Finally, the df_dist class has a getSample() method for returning a random sample from the instance’s PERT/beta distribution.

class df_dist:
    
    def __init__(self, dist_mu, dist_low, dist_high, dist_size=10_000, dist_temp=4, name='Unknown'):
        self.mu = dist_mu
        self.low = dist_low
        self.high = dist_high
        self.temp = dist_temp
        self.dsize = dist_size
        self.name = name
        
        self.df = find_beta(self.mu, self.low, self.high, self.temp)
        self.df.sort_values(by=['Test Results'], inplace=True)
        self.new_mu, self.new_low, self.new_high = list(self.df.iloc[0, 1:4])
        self.pert = pert_dist(self.new_mu, self.new_low, self.new_high, self.dsize, self.temp)
        
    def checkParams(self):
        print(f'Pert Variables: Mode: {self.new_mu}, Low: {self.new_low}, High: {self.new_high}')
        
        fig, ax = plt.subplots(1, figsize=(8,5))
        ax.hist(self.pert, 25)
        ax.set_title(f'{self.name} Probability Distribution Histogram')
        ax.set_xlabel('$value$')
        ax.set_ylabel('$count$')
        plt.show()
        
        print(f'Target Values: Mean={self.mu}, Low={self.low}, High={self.high}')
        print(f'Dist. Values: Mean={np.mean(self.pert)}, Low={np.percentile(self.pert, 10)}, High={np.percentile(self.pert, 90)}')

    def getSample(self):
        return random.sample(list(self.pert), 1)[0]

Creating the probability distributions for each of the required variables for a probabilistic SEIR model

# length of incubation period
# Defaults: mean 5.1, min CI value of 1.9, max CI value of 15.6
e2i = df_dist(5.1, 2.5, 11.5, N, 4, 'Incubation Period')

# spread mostly likely during infectious period, unlikely once recovery begins: https://www.statnews.com/2020/03/09/people-shed-high-levels-of-coronavirus-study-finds-but-most-are-likely-not-infectious-after-recovery-begins/
# infectious period, time to begining recovery after exposure to infectious transition
# Defaults: 
i2r = df_dist(7, 4, 12, N, 4, 'Infectious Period')

# (.009 default) case fatality rate, https://wwwnc.cdc.gov/eid/article/26/6/20-0320_article
# Defaults: mean .009, min CFR (.0025), max CFR (.03 default)
cfr = df_dist(.015, .0025, .03, name='Case Fatality Rate')

# H1N1 ratio for confirmed cases to estimated total infected
c2i = df_dist((1/79), (1/148), (1/47), name='Confirmed to Infectious Ratio')

# H1N1 ratio for hospitalized to infected
h2i = df_dist(.0045, .0016, .012, name='Hospitalized to Infectious Ratio')

# person-to-person spread: primary means of propagation https://www.cdc.gov/coronavirus/2019-ncov/prepare/transmission.html?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fcoronavirus%2F2019-ncov%2Fabout%2Ftransmission.html
freq_cont = df_dist(4, 0, 6, name='Daily Frequent Contacts')
infreq_cont = df_dist(3, 0, 5, name='Daily Infrequent Contacts')

Exporting the distributions for later import and usage

Exporting the above distribution class objects for later use in a modified application in order to save time and computational energy by avoiding rerunning the intense distribution finding algorithms of the distribution classes.

filename = 'e2i_object'
outfile = open(filename, 'wb')
pickle.dump(e2i, outfile)
outfile.close()

filename = 'i2r_object'
outfile = open(filename, 'wb')
pickle.dump(i2r, outfile)
outfile.close()

filename =  'c2i_object'
outfile = open(filename, 'wb')
pickle.dump(c2i, outfile)
outfile.close()

filename = 'h2i_object'
outfile = open(filename, 'wb')
pickle.dump(h2i, outfile)
outfile.close()

filename = 'freq_cont_object'
outfile = open(filename, 'wb')
pickle.dump(freq_cont, outfile)
outfile.close()

filename = 'infreq_cont_object'
outfile = open(filename, 'wb')
pickle.dump(infreq_cont, outfile)
outfile.close()

filename = 'cfr_object'
outfile = open(filename, 'wb')
pickle.dump(cfr, outfile)
outfile.close()

Example of the checkParams() method and associated output

e2i.checkParams()

Pert vars: mode: 2.5, low: 0.6538461538461537, high: 23.0

<IPython.core.display.Javascript object>

Original inputs: Mean=5.1, Low=2.5, High=11.5
New outputs: Mean=5.64894409721856, Low=1.681319405861826, High=10.66456232799803

Example of the results output DataFrame calculated by the find_beta() method

e2i.df.head()

	Test Results	Mode Value	Low Value	High Value	mean_err	expected mu
239	1.315555	2.500000	0.653846	23.000000	0.496877	5.1
318	1.316557	2.500000	0.915385	22.541026	0.543565	5.1
1958	1.317188	2.730769	1.046154	22.541026	0.543493	5.1
357	1.411159	2.500000	1.046154	22.082051	0.481530	5.1
1715	1.461687	2.730769	0.261538	21.164103	0.329377	5.1

The df_dist class objects can return instance attribute information and return distribution samples

e2i.name

'Incubation Period'

print(f'{freq_cont.name} distribution size: {freq_cont.dsize}')

Daily Frequent Contacts distribution size: 10000

cfr.getSample()

0.010151197124524818

for i in range(5):
    print(i2r.getSample())

0101533539017336
556873291853238
6950106403153087
661928818757324
7697746510448935