Reputation: 11
I'm conducting a case study where I have to predict claim number per policy. Since my variable ClaimNb is not binary I can't use logistic Regression but I have to use Poisson. My code for GLM model:
import statsmodels.api as sm
import statsmodels.formula.api as smf
formula= 'ClaimNb ~ BonusMalus+VehAge+Freq+VehGas+Exposure+VehPower+Density+DrivAge'
model = smf.glm(formula = formula, data=df,
family=sm.families.Poisson())
I have also split my data
# train-test-split
train , test = train_test_split(data,test_size=0.2,random_state=0)
# seperate the target and independent variable
train_x = train.drop(columns=['ClaimNb'],axis=1)
train_y = train['ClaimNb']
test_x = test.drop(columns=['ClaimNb'],axis=1)
test_y = test['ClaimNb']
My problem now is the prediction, I have used the following but did not work:
from sklearn.linear_model import PoissonRegressor model = PoissonRegressor(alpha=1e-3, max_iter=1000)
model.fit(train_x,train_y)
predict = model.predict(test_x)
Please is there any other way to predict and check the accuracy of the model?
thanks
Upvotes: 0
Views: 5034
Reputation: 46948
You need to assign the model.fit() and predict with that, it's different from sklearn. Also, if you using the formula, it is better to split your dataframe into train and test, predict using that. For example:
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,(50,4)),columns=['ClaimNb','BonusMalus','VehAge','Freq'])
#X = df[['BonusMalus','VehAge','Freq']]
#y = df['ClaimNb']
df_train = df.sample(round(len(df)*0.8))
df_test = df.drop(df_train.index)
formula= 'ClaimNb ~ BonusMalus+VehAge+Freq'
model = smf.glm(formula = formula, data=df,family=sm.families.Poisson())
result = model.fit()
And we can do the prediction:
result.predict(df_train)
Or:
result.predict(df_test)
Upvotes: 1