jeangelj
jeangelj

Reputation: 4498

Python create linear regression prediction pandas dataframe by group

I want to create a python pandas dataframe column based on a linear regression loop

This is the source data pandas dataframe df:

campaign    |     date     |    shown 
   A           2015-10-11       363563
   A           2015-10-12       345657
   A           2015-10-13       345346
   B           2015-10-11       23467
   B           2015-10-15       357990
   C           2015-10-11       97808

I want to user linear regression and for each group predict the amount shown for 2015-11-30.

So this is the final new prediction dataframe I am looking for:

 campaign |   Prediction(2015-11-30)
      A           ...
      B           ...
      C           ...

my code so far:

df['date_ordinal'] = df['date'].apply(lambda x: x.toordinal())
model = LinearRegression()
X = df[['date_ordinal']]
y = df.shown
model.fit(X, y)   

df_results = pd.DataFrame()
for (group, df_gp) in df.groupby('campaign'):
   df_results['campaign'] = group
   X=df_gp[['date_ordinal']]
   y=df_gp.shown
   model.fit(X,y)
   coefs = list(zip(X.columns, model.coef_))
   df_results['prediction'] = model.predict(735947)

df_results

However, when I run this code, I only get one prediction, I don't get a dataframe with one column "group" and the predicted values next to it.

Thank You!

Upvotes: 1

Views: 4215

Answers (1)

Karl Anka
Karl Anka

Reputation: 2859

Try this:

groups = []
results = []
for (group, df_gp) in df.groupby('campaign'):
    X=df_gp[['date_ordinal']]
    y=df_gp.shown
    model.fit(X,y)
    coefs = list(zip(X.columns, model.coef_))
    results.append(model.predict(735947)[0])
    groups.append(group)

df_results = pd.DataFrame({'campaign':groups, 'prediction':results})

According to answers here: add one row in a pandas.DataFrame adding rows one by one is not the most efficient solution. And as you also see in the anwers there is that data must be inserted at an index.

Upvotes: 3

Related Questions