Reputation: 105
First time trying to forecast using basic linear regression in Python. Discovered I had to convert dates to ordinal dates then into a 2D numpy array. I now want to convert the numpy array back to YYYY/MMM/DD for a useable visual plot, but am failing. Never used numpy before, therefore x_full_month.map(dt.datetime.fromordinal) is not working, as does not seem to be valid in numpy.
from sklearn.linear_model import LinearRegression
model=LinearRegression()
df['Date_Ordinal']=df['Date'].map(dt.datetime.toordinal)
x=df['Date_Ordinal']
y=df['Cost']
x_train = x.values.reshape(-1, 1)
y_train = y.values.reshape(-1, 1)
y_pred = model.predict(x_train)
From the predictive model, I'm then creating a new X of ordinal dates for the full month, to get a full months response
x_full_month = np.arange(737850,737880,1).reshape((-1, 1))
y_pred_new = model.predict(x_new)
print('predicted response:', y_pred.T, sep='\n')
This seems to work, however has an ordinal dated X (as expected), how would I get a nicely formatted X for plotting. Or get this back into a Pandas array, which I'm more familiar with? Or, am I completely going about this a roundabout way?
Edit: corrected parameter name
Upvotes: 0
Views: 530
Reputation: 105
Several hours later and I have a solution. I'm still sure I'm going about this in-efficiently, but the steps below do work for me.
# .flatten converts numpy arrays into pandas df columns
df = pd.DataFrame(y_pred.flatten(),x_full_month.flatten())
# creates a new index (as pd.Dataframe made x_full_month the index initially)
df.reset_index(inplace=True)
# meaningful column names
df = df.rename(columns = {'index':'ord_date',0:'cumul_DN'})
# Convert oridinal date to yyyy-mm-dd
df['date']=df['ord_date'].map(dt.datetime.fromordinal)
Upvotes: 1