Lopez
Lopez

Reputation: 472

Accessing Panda's DataFrame Columns Based on Condition to Derive Results

I want to pass my predictive model values that will be extracted from Pandas DataFrame based on condition and results from models will be placed in Pandas DataFrame.

DataFrame

+------------+--------------+------------+--------------+
|    Date    | Actual Value | Prediction | Model Values |
+------------+--------------+------------+--------------+
| 02/01/2021 |         0.02 |            |              |
| 03/01/2021 |         0.06 |            |              |
| 04/01/2021 |         0.02 |            |              |
| 05/01/2021 |         0.04 |            |              |
| 06/01/2021 |         0.04 |            |              |
| 07/01/2021 |         0.08 |            |              |
| 08/01/2021 |         0.06 |            |              |
| 09/01/2021 |         0.02 |       0.03 |         0.03 |
| 10/01/2021 |         0.20 |            |              |
| 11/01/2021 |         0.02 |            |              |
| 12/01/2021 |         0.02 |            |              |
| 13/01/2021 |         0.09 |       0.06 |         0.06 |
| 14/01/2021 |         0.06 |            |              |
| 15/01/2021 |         0.04 |            |              |
| 16/01/2021 |         0.06 |            |              |
| 17/01/2021 |         0.03 |       0.04 |         0.04 |
| 18/01/2021 |         0.03 |            |              |
| 19/01/2021 |         0.06 |            |              |
| 20/01/2021 |         0.06 |            |              |
+------------+--------------+------------+--------------+

Actual Value is the actual value for the particular date. Prediction is the predicted value for that date. (needs to populated) Model Values the value that needs to be passed to model to get results. (the tricky part)

The model takes past 7 days values and gives next day's output. Hence the minimum date on which a prediction can be made is 9th Jan. To make prediction for 9th Jan df['Actual Value'].iloc[:7,] will be passed to the model which will output numpy (1*1) array. The value is then placed in column Prediction. (i.e. 0.03)

The part where I am stuck

I want to use the predicted value on 9th Jan and past 6 values to make prediction for 10th Jan. Which means df['Actual Value'].iloc[1:7,:] + df['Prediction'].iloc[7,:].

I want to access these values and pass it to model which will give a result numpy (1*1) array that needs to be placed in column Prediction on Date 10th Jan.

Now that we have value for 10th Jan we can make prediction for 11th Jan based on similar logic use df['Actual Value'].iloc[2:7,:] + df['Prediction'].iloc[7:9,:]. Pass these values to model and get result for 11th Jan.

Similar process to be followed for predicting 12th Jan.

But for 13th Jan we will use Actual Value for last 7 days to get Prediction. This part is already done. There will always be a value after every 4 days on 09th, 13th and 17th. The code to do that is as follows:

look_back = 7
look_forward = 1
n_days_pred = 4
pred = 0
predictions = []
while pred <= X_test.shape[0]:
    predictions.append(model.predict(X_test[pred].reshape(1, look_back, look_forward)).flatten().tolist())
    pred = pred + n_days_pred
predictions_flat = [item for sublist in predictions for item in sublist]
start_date = test_df.Date.min() + dt.timedelta(look_back)
predictions_df = pd.DataFrame(index=pd.date_range(start=start_date, periods=len(predictions), freq='4D'))
predictions_df['Prediction'] = predictions_flat

# Join predictions_df with df 
df = analysis_df.set_index('Date').join(predictions_df)

df from code above is same as shown above.

I am stuck at how to access the values form column Actual Value & Prediction to get predictions for 10th, 11th, 12th, 14th, 15th, 16th, 18th, 19th and 20th.

Upvotes: 0

Views: 160

Answers (1)

Cimbali
Cimbali

Reputation: 11395

First let’s use the dates to index the dataframe rather than indices.

df = df.set_index('date')

Now computing your every-4-days computation

n_days_pred = 4
look_back = 7

for model_start_date in df.index[look_back::n_days_pred]:
    for predict_date in pd.date_range(model_start_date, periods=n_days_pred, freq='D'):
 
        model_input = pd.concat([
           # All actual values before model_start_date
            df.loc[predict_date - pd.Timedelta(days=look_back):model_start_date - pd.Timedelta(days=1), 'actual value'],

            # All predictions since model_start_date
            df.loc[model_start_date:predict_date - pd.Timedelta(days=1), 'prediction'],
        ])
        df.loc[predict_date, 'prediction'] = model.predict( .... model_input .... )

If I fill prediction with only the word pred and actual value with only the word val and then print model_input from that loop, here is what I get:

2021-01-02    val
2021-01-03    val
2021-01-04    val
2021-01-05    val
2021-01-06    val
2021-01-07    val
2021-01-08    val
Freq: D, dtype: object
2021-01-03     val
2021-01-04     val
2021-01-05     val
2021-01-06     val
2021-01-07     val
2021-01-08     val
2021-01-09    pred
Freq: D, dtype: object
2021-01-04     val
2021-01-05     val
2021-01-06     val
2021-01-07     val
2021-01-08     val
2021-01-09    pred
2021-01-10    pred
Freq: D, dtype: object
2021-01-05     val
2021-01-06     val
2021-01-07     val
2021-01-08     val
2021-01-09    pred
2021-01-10    pred
2021-01-11    pred
Freq: D, dtype: object
2021-01-06    val
2021-01-07    val
2021-01-08    val
2021-01-09    val
2021-01-10    val
2021-01-11    val
2021-01-12    val
Freq: D, dtype: object
2021-01-07     val
2021-01-08     val
2021-01-09     val
2021-01-10     val
2021-01-11     val
2021-01-12     val
2021-01-13    pred
Freq: D, dtype: object
2021-01-08     val
2021-01-09     val
2021-01-10     val
2021-01-11     val
2021-01-12     val
2021-01-13    pred
2021-01-14    pred
Freq: D, dtype: object
2021-01-09     val
2021-01-10     val
2021-01-11     val
2021-01-12     val
2021-01-13    pred
2021-01-14    pred
2021-01-15    pred
Freq: D, dtype: object
2021-01-10    val
2021-01-11    val
2021-01-12    val
2021-01-13    val
2021-01-14    val
2021-01-15    val
2021-01-16    val
Freq: D, dtype: object
2021-01-11     val
2021-01-12     val
2021-01-13     val
2021-01-14     val
2021-01-15     val
2021-01-16     val
2021-01-17    pred
Freq: D, dtype: object
2021-01-12     val
2021-01-13     val
2021-01-14     val
2021-01-15     val
2021-01-16     val
2021-01-17    pred
2021-01-18    pred
Freq: D, dtype: object
2021-01-13     val
2021-01-14     val
2021-01-15     val
2021-01-16     val
2021-01-17    pred
2021-01-18    pred
2021-01-19    pred
Freq: D, dtype: object

Where df is

df = pd.DataFrame({'prediction': 'pred', 'actual value': 'val'}, index=pd.date_range('2021-01-02', '2021-01-20', freq='D'))

As you can see it’s between 0 and 3 latest pred (with the dates in the index), then before that val − and always 7 entries. I don’t think you can do much smarter than a loop, since you’re computing the inputs of the next iterations as you go.

Upvotes: 1

Related Questions