ramblor
ramblor

Reputation: 21

Using weightings in a Poisson model using Statsmodels module

I'm trying to convert the following code from R to Python using the Statsmodels module:

model <- glm(goals ~ att + def + home - (1), data=df, family=poisson, weights=weight)

I've got a similar dataframe (named df) using pandas, and currently have the following line in Python (version 3.4 if it makes a difference):

model = sm.Poisson.from_formula("goals ~ att + def + home - 1", df).fit()

Or, using GLM:

smf.glm("goals ~ att + def + home - 1", df, family=sm.families.Poisson()).fit()

However, I can't get the weighting terms to work. Each record in the dataframe has a date, and I want more recent records to be more valuable for fitting the model than older ones. I've not seen an example of it being used, but surely if it can be done in R, it can be done on Statsmodels... right?

Upvotes: 2

Views: 2461

Answers (3)

theFriedBee
theFriedBee

Reputation: 1

There are two solutions for setting up weights for Poisson regression. The first is to use freq_weigths in the GLM function as mentioned by MarkWPiper. The second is to just go with Poisson regression and pass the weights to exposure. As documented here: "Log(exposure) is added to the linear prediction with coefficient equal to 1." This does the same mathematical trick as mentioned by Yaron, although the parameter has a different original meaning. A sample code is as follows:

import statsmodels.api as sm
# or: from statsmodels.discrete.discrete_model import Poisson
fitted = sm.Poisson.from_formula("goals ~ att + def + home - 1", data=df, exposure=df['weight']).fit()

Upvotes: 0

MarkWPiper
MarkWPiper

Reputation: 923

freq_weights is now supported on GLM Poisson, but unfortunately not on sm.Poisson

To use it, pass freq_weights when creating the GLM:

import statsmodels.api as sm
import statsmodels.formula.api as smf

formula = "goals ~ att + def + home - 1"
smf.glm(formula, df, family=sm.families.Poisson(), freq_weights=df['freq_weight']).fit()

Upvotes: 2

Yaron
Yaron

Reputation: 1852

I've encountered the same issue. there is a workaround that should lead to same results. add the weight in logarithm scale (np.log(weight)) you need as one of the explanatory variables with beta equal to 1 (offset option). I can see there is an option for the exposure which doing the same as I explained above.

Upvotes: 0

Related Questions