Reputation: 93
I have code which estimates a parameter beta in an ODE system, given that all parameters are known other than beta and the peak of the 'epidemic' simulation, is 10% of the starting population. However, I realise solving the root might not always work to find the value. Is there any method of using scipy.optimize to find an alternate way of estimating this, by taking the squared difference of sum at the 10% peak, squaring the whole thing, then minimising that? This is the current code:
import numpy as np
from scipy.integrate import odeint
from scipy.optimize import root
def peak_infections(beta, days = 100):
# Total population, N.
N = 1000
# Initial number of infected and recovered individuals, I0 and R0.
I0, R0 = 10, 0
# Everyone else, S0, is susceptible to infection initially.
S0 = N - I0 - R0
J0 = I0
# Contact rate, beta, and mean recovery rate, gamma, (in 1/days).
gamma = 1/7
# A grid of time points (in days)
t = np.linspace(0, days, days + 1)
# The SIR model differential equations.
def deriv(y, t, N, beta, gamma):
S, I, R, J = y
dS = ((-beta * S * I) / N)
dI = ((beta * S * I) / N) - (gamma * I)
dR = (gamma * I)
dJ = ((beta * S * I) / N)
return dS, dI, dR, dJ
# Initial conditions are S0, I0, R0
# Integrate the SIR equations over the time grid, t.
solve = odeint(deriv, (S0, I0, R0, J0), t, args=(N, beta, gamma))
S, I, R, J = solve.T
return np.max(I)/N
root(lambda b: peak_infections(b)-0.1, x0 = 0.5).x
Using scipy.optimize(root(lambda b: peak_infections(b)-0.1, x0 = 0.5).x)
only returns a misuse of function error.
EDIT ---------------------------------------------------
I am wondering how this approach could be applied to if instead of having 10% of the peak as a key piece of information, I had a dataframe of weekly new numbers. How could a similar method be used to take that data into account in helping estimate beta? If we say
import pandas as pd
d = {'Week': [1, 2,3,4,5,6,7,8,9,10,11], 'incidence': [206.1705794,2813.420201,11827.9453,30497.58655,10757.66954,7071.878779,3046.752723,1314.222882,765.9763902,201.3800578,109.8982006]}
df = pd.DataFrame(data=d)
Now this is our data, rather than knowing the peak of the simulation is 10% of the N starting population. How can this be used to minimise and find a beta estimate?
-----EDIT 2-------
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
from scipy.optimize import minimize
import pandas as pd
from scipy.optimize import leastsq
###############################################################################
########## WITH WEEKLY DATA
###############################################################################
#t = np.arange(0,84,7)
t = np.linspace(0, 77, 77+1)
d = {'Week': [t[7],t[14],t[21],t[28],t[35],t[42],t[49],t[56],t[63],t[70],t[77]], 'incidence': [206.1705794,2813.420201,11827.9453,30497.58655,10757.66954,7071.878779,3046.752723,1314.222882,765.9763902,201.3800578,109.8982006]}
df = pd.DataFrame(data=d)
#d = {'Week': t, 'incidence': [0,206.1705794,2813.420201,11827.9453,30497.58655,10757.66954,7071.878779,3046.752723,1314.222882,765.9763902,201.3800578,109.8982006]}
#df = pd.DataFrame(data=d)
def peak_infections(beta, df):
# Weeks for which the ODE system will be solved
#weeks = df.Week.to_numpy()
# Total population, N.
N = 100000
# Initial number of infected and recovered individuals, I0 and R0.
I0, R0 = 10, 0
# Everyone else, S0, is susceptible to infection initially.
S0 = N - I0 - R0
J0 = I0
# Contact rate, beta, and mean recovery rate, gamma, (in 1/days).
#reproductive no. R zero is beta/gamma
gamma = 1/6 #rate should be in weeks now
# A grid of time points
t7 = np.arange(7,84,7)
# The SIR model differential equations.
def deriv(y, t7, N, beta, gamma):
S, I, R, J = y
dS = ((-beta * S * I) / N)
dI = ((beta * S * I) / N) - (gamma * I)
dR = (gamma * I)
dJ = ((beta * S * I) / N)
return dS, dI, dR, dJ
# Initial conditions are S0, I0, R0
# Integrate the SIR equations over the time grid, t.
solve = odeint(deriv, (S0, I0, R0, J0), t7, args=(N, beta, gamma))
S, I, R, J = solve.T
return np.max(I)/N
def residual(x, df):
# Total population, N.
N = 100000
incidence = df.incidence.to_numpy()/N
return np.sum((peak_infections(x, df) - incidence) ** 2)
x0 = 0.5
res = minimize(residual, x0, args=(df), method="Nelder-Mead").x
print(res)
Upvotes: 1
Views: 522
Reputation: 1476
Yes, you can do this using scipy.optimize.minimize
.
One approach would be as follows:
from scipy.optimize import minimize
def residual(x):
return (peak_infections(x) - 0.1) ** 2
x0 = 0.5
res = minimize(residual, x0, method="Nelder-Mead", options={'fatol':1e-04})
print(res)
This is right now giving almost the same answer as the root method you posted but works as an alternative.
As per the discussion in the comment section of this answer, and according to the edit to you question, I propose the following solution:
import numpy as np
from scipy.integrate import odeint
from scipy.optimize import minimize
import pandas as pd
d = {'Week': [1, 2,3,4,5,6,7,8,9,10,11], 'incidence': [206.1705794,2813.420201,11827.9453,30497.58655,10757.66954,7071.878779,3046.752723,1314.222882,765.9763902,201.3800578,109.8982006]}
df = pd.DataFrame(data=d)
def peak_infections(beta, df):
# Weeks for which the ODE system will be solved
weeks = df.Week.to_numpy()
# Total population, N.
N = 1000
# Initial number of infected and recovered individuals, I0 and R0.
I0, R0 = 10, 0
# Everyone else, S0, is susceptible to infection initially.
S0 = N - I0 - R0
J0 = I0
# Contact rate, beta, and mean recovery rate, gamma, (in 1/days).
gamma = 1/7 * 7 #rate should be in weeks now
# A grid of time points (in days)
t = np.linspace(0, weeks[-1], weeks[-1] + 1)
# The SIR model differential equations.
def deriv(y, t, N, beta, gamma):
S, I, R, J = y
dS = ((-beta * S * I) / N)
dI = ((beta * S * I) / N) - (gamma * I)
dR = (gamma * I)
dJ = ((beta * S * I) / N)
return dS, dI, dR, dJ
# Initial conditions are S0, I0, R0
# Integrate the SIR equations over the time grid, t.
solve = odeint(deriv, (S0, I0, R0, J0), t, args=(N, beta, gamma))
S, I, R, J = solve.T
return I/N
def residual(x, df):
# Total population, N.
N = 1000
incidence = df.incidence.to_numpy()/N
return np.sum((peak_infections(x, df)[1:] - incidence) ** 2)
x0 = 0.5
res = minimize(residual, x0, args=(df), method="Nelder-Mead", options={'fatol':1e-04})
print(res)
Here, I calculate the ODE system for 11 weeks and compare the result directly with the 11 incidence values from the provided dataframe. After the squared difference (element-by-element), a sum is taken and that sum is minimized. The result, however, is not very promising.
Upvotes: 1