Python create linear regression prediction pandas dataframe by group

Question

I want to create a python pandas dataframe column based on a linear regression loop

This is the source data pandas dataframe df:

campaign    |     date     |    shown 
   A           2015-10-11       363563
   A           2015-10-12       345657
   A           2015-10-13       345346
   B           2015-10-11       23467
   B           2015-10-15       357990
   C           2015-10-11       97808

I want to user linear regression and for each group predict the amount shown for 2015-11-30.

So this is the final new prediction dataframe I am looking for:

 campaign |   Prediction(2015-11-30)
      A           ...
      B           ...
      C           ...

my code so far:

df['date_ordinal'] = df['date'].apply(lambda x: x.toordinal())
model = LinearRegression()
X = df[['date_ordinal']]
y = df.shown
model.fit(X, y)   

df_results = pd.DataFrame()
for (group, df_gp) in df.groupby('campaign'):
   df_results['campaign'] = group
   X=df_gp[['date_ordinal']]
   y=df_gp.shown
   model.fit(X,y)
   coefs = list(zip(X.columns, model.coef_))
   df_results['prediction'] = model.predict(735947)

df_results

However, when I run this code, I only get one prediction, I don't get a dataframe with one column "group" and the predicted values next to it.

Thank You!

Karl Anka · Accepted Answer

Try this:

groups = []
results = []
for (group, df_gp) in df.groupby('campaign'):
    X=df_gp[['date_ordinal']]
    y=df_gp.shown
    model.fit(X,y)
    coefs = list(zip(X.columns, model.coef_))
    results.append(model.predict(735947)[0])
    groups.append(group)

df_results = pd.DataFrame({'campaign':groups, 'prediction':results})

According to answers here: add one row in a pandas.DataFrame adding rows one by one is not the most efficient solution. And as you also see in the anwers there is that data must be inserted at an index.

Python create linear regression prediction pandas dataframe by group

Answers (1)

Related Questions