Reputation: 4498
I want to create a python pandas dataframe column based on a linear regression loop
This is the source data pandas dataframe df:
campaign | date | shown
A 2015-10-11 363563
A 2015-10-12 345657
A 2015-10-13 345346
B 2015-10-11 23467
B 2015-10-15 357990
C 2015-10-11 97808
I want to user linear regression and for each group predict the amount shown for 2015-11-30.
So this is the final new prediction dataframe I am looking for:
campaign | Prediction(2015-11-30)
A ...
B ...
C ...
my code so far:
df['date_ordinal'] = df['date'].apply(lambda x: x.toordinal())
model = LinearRegression()
X = df[['date_ordinal']]
y = df.shown
model.fit(X, y)
df_results = pd.DataFrame()
for (group, df_gp) in df.groupby('campaign'):
df_results['campaign'] = group
X=df_gp[['date_ordinal']]
y=df_gp.shown
model.fit(X,y)
coefs = list(zip(X.columns, model.coef_))
df_results['prediction'] = model.predict(735947)
df_results
However, when I run this code, I only get one prediction, I don't get a dataframe with one column "group" and the predicted values next to it.
Thank You!
Upvotes: 1
Views: 4215
Reputation: 2859
Try this:
groups = []
results = []
for (group, df_gp) in df.groupby('campaign'):
X=df_gp[['date_ordinal']]
y=df_gp.shown
model.fit(X,y)
coefs = list(zip(X.columns, model.coef_))
results.append(model.predict(735947)[0])
groups.append(group)
df_results = pd.DataFrame({'campaign':groups, 'prediction':results})
According to answers here: add one row in a pandas.DataFrame adding rows one by one is not the most efficient solution. And as you also see in the anwers there is that data must be inserted at an index.
Upvotes: 3