Reputation: 472
I want to pass my predictive model values that will be extracted from Pandas DataFrame based on condition and results from models will be placed in Pandas DataFrame.
DataFrame
+------------+--------------+------------+--------------+
| Date | Actual Value | Prediction | Model Values |
+------------+--------------+------------+--------------+
| 02/01/2021 | 0.02 | | |
| 03/01/2021 | 0.06 | | |
| 04/01/2021 | 0.02 | | |
| 05/01/2021 | 0.04 | | |
| 06/01/2021 | 0.04 | | |
| 07/01/2021 | 0.08 | | |
| 08/01/2021 | 0.06 | | |
| 09/01/2021 | 0.02 | 0.03 | 0.03 |
| 10/01/2021 | 0.20 | | |
| 11/01/2021 | 0.02 | | |
| 12/01/2021 | 0.02 | | |
| 13/01/2021 | 0.09 | 0.06 | 0.06 |
| 14/01/2021 | 0.06 | | |
| 15/01/2021 | 0.04 | | |
| 16/01/2021 | 0.06 | | |
| 17/01/2021 | 0.03 | 0.04 | 0.04 |
| 18/01/2021 | 0.03 | | |
| 19/01/2021 | 0.06 | | |
| 20/01/2021 | 0.06 | | |
+------------+--------------+------------+--------------+
Actual Value
is the actual value for the particular date.
Prediction
is the predicted value for that date. (needs to populated)
Model Values
the value that needs to be passed to model to get results. (the tricky part)
The model takes past 7 days values and gives next day's output. Hence the minimum date on which a prediction can be made is 9th Jan
.
To make prediction for 9th Jan
df['Actual Value'].iloc[:7,]
will be passed to the model which will output numpy (1*1) array. The value is then placed in column Prediction
. (i.e. 0.03)
The part where I am stuck
I want to use the predicted value on 9th Jan
and past 6 values to make prediction for 10th Jan
. Which means df['Actual Value'].iloc[1:7,:]
+ df['Prediction'].iloc[7,:]
.
I want to access these values and pass it to model which will give a result numpy (1*1) array that needs to be placed in column Prediction
on Date
10th Jan
.
Now that we have value for 10th Jan
we can make prediction for 11th Jan
based on similar logic use df['Actual Value'].iloc[2:7,:]
+ df['Prediction'].iloc[7:9,:]
. Pass these values to model and get result for 11th Jan
.
Similar process to be followed for predicting 12th Jan
.
But for 13th Jan
we will use Actual Value
for last 7 days to get Prediction
. This part is already done. There will always be a value after every 4 days on 09th, 13th and 17th. The code to do that is as follows:
look_back = 7
look_forward = 1
n_days_pred = 4
pred = 0
predictions = []
while pred <= X_test.shape[0]:
predictions.append(model.predict(X_test[pred].reshape(1, look_back, look_forward)).flatten().tolist())
pred = pred + n_days_pred
predictions_flat = [item for sublist in predictions for item in sublist]
start_date = test_df.Date.min() + dt.timedelta(look_back)
predictions_df = pd.DataFrame(index=pd.date_range(start=start_date, periods=len(predictions), freq='4D'))
predictions_df['Prediction'] = predictions_flat
# Join predictions_df with df
df = analysis_df.set_index('Date').join(predictions_df)
df
from code above is same as shown above.
I am stuck at how to access the values form column Actual Value
& Prediction
to get predictions for 10th, 11th, 12th, 14th, 15th, 16th, 18th, 19th and 20th.
Upvotes: 0
Views: 160
Reputation: 11395
First let’s use the dates to index the dataframe rather than indices.
df = df.set_index('date')
Now computing your every-4-days computation
n_days_pred = 4
look_back = 7
for model_start_date in df.index[look_back::n_days_pred]:
for predict_date in pd.date_range(model_start_date, periods=n_days_pred, freq='D'):
model_input = pd.concat([
# All actual values before model_start_date
df.loc[predict_date - pd.Timedelta(days=look_back):model_start_date - pd.Timedelta(days=1), 'actual value'],
# All predictions since model_start_date
df.loc[model_start_date:predict_date - pd.Timedelta(days=1), 'prediction'],
])
df.loc[predict_date, 'prediction'] = model.predict( .... model_input .... )
If I fill prediction
with only the word pred
and actual value
with only the word val
and then print model_input
from that loop, here is what I get:
2021-01-02 val
2021-01-03 val
2021-01-04 val
2021-01-05 val
2021-01-06 val
2021-01-07 val
2021-01-08 val
Freq: D, dtype: object
2021-01-03 val
2021-01-04 val
2021-01-05 val
2021-01-06 val
2021-01-07 val
2021-01-08 val
2021-01-09 pred
Freq: D, dtype: object
2021-01-04 val
2021-01-05 val
2021-01-06 val
2021-01-07 val
2021-01-08 val
2021-01-09 pred
2021-01-10 pred
Freq: D, dtype: object
2021-01-05 val
2021-01-06 val
2021-01-07 val
2021-01-08 val
2021-01-09 pred
2021-01-10 pred
2021-01-11 pred
Freq: D, dtype: object
2021-01-06 val
2021-01-07 val
2021-01-08 val
2021-01-09 val
2021-01-10 val
2021-01-11 val
2021-01-12 val
Freq: D, dtype: object
2021-01-07 val
2021-01-08 val
2021-01-09 val
2021-01-10 val
2021-01-11 val
2021-01-12 val
2021-01-13 pred
Freq: D, dtype: object
2021-01-08 val
2021-01-09 val
2021-01-10 val
2021-01-11 val
2021-01-12 val
2021-01-13 pred
2021-01-14 pred
Freq: D, dtype: object
2021-01-09 val
2021-01-10 val
2021-01-11 val
2021-01-12 val
2021-01-13 pred
2021-01-14 pred
2021-01-15 pred
Freq: D, dtype: object
2021-01-10 val
2021-01-11 val
2021-01-12 val
2021-01-13 val
2021-01-14 val
2021-01-15 val
2021-01-16 val
Freq: D, dtype: object
2021-01-11 val
2021-01-12 val
2021-01-13 val
2021-01-14 val
2021-01-15 val
2021-01-16 val
2021-01-17 pred
Freq: D, dtype: object
2021-01-12 val
2021-01-13 val
2021-01-14 val
2021-01-15 val
2021-01-16 val
2021-01-17 pred
2021-01-18 pred
Freq: D, dtype: object
2021-01-13 val
2021-01-14 val
2021-01-15 val
2021-01-16 val
2021-01-17 pred
2021-01-18 pred
2021-01-19 pred
Freq: D, dtype: object
Where df
is
df = pd.DataFrame({'prediction': 'pred', 'actual value': 'val'}, index=pd.date_range('2021-01-02', '2021-01-20', freq='D'))
As you can see it’s between 0 and 3 latest pred
(with the dates in the index), then before that val
− and always 7 entries. I don’t think you can do much smarter than a loop, since you’re computing the inputs of the next iterations as you go.
Upvotes: 1