Reputation: 5354
I have some difficulties understanding how to use GLM model with poisson.
import numpy as np
import scikits.statsmodels as sm
dataset = pd.DataFrame({'A':np.random.rand(100)*1000,
'B':np.random.rand(100)*100,
'C':np.random.rand(100)*10,
'target':np.random.rand(100)})
X = dataset.ix[:,['A','B','C']].values
y = dataset.ix[:,['target']].values
size = 1e5
nbeta = 3
fam = sm.families.Poisson()
glm = sm.GLM(y,X, family=fam)
res = glm.fit()
Upvotes: 0
Views: 2369
Reputation: 8283
Sourceforge is down right now. When it's back up, you should read through the documentation and examples. There are plenty of usage notes for prediction and GLM.
How to label your target is up to you and probably a question for cross-validated. Poisson is intended for counts but can be used on continuous data, but you should know what you're doing.
If you have 0/1 then you want a Logit or Probit model. Something like this. You don't need to convert the pandas objects to numpy.
import numpy as np
import statsmodels.api as sm
dataset = pd.DataFrame({'A':np.random.rand(100)*1000,
'B':np.random.rand(100)*100,
'C':np.random.rand(100)*10,
'target':np.random.randint(0, 5, 100)})
X = dataset[['A','B','C']]
X['constant'] = 1
y = dataset['target']
size = 1e5
nbeta = 3
fam = sm.families.Poisson()
glm = sm.GLM(y,X, family=fam)
res = glm.fit()
predict = res.predict()
Or you could directly use the maximum likelihood estimator for Poisson.
res = sm.Poisson(y, X).fit()
predict = res.predict()
Upvotes: 2