Reputation: 311
I have a DataFrame with multiple columns, each column contains NaN values at different index positions. The indexes are 5 minute frequency datetimes.
Take the first column: Select the rows until the first NaN row, or until the first NaN range if adjacent NaN values follow each other.
I'd like to forecast from the value sequence the NaN value, or NaN values.
The next iteration: Select the rows (from the beginning) until the next NaN value or NaN adjacent range, and select the NaN values/range as well. Forecast the NaN values/range.
This goes on until the column doesn't contain NaNs.
import pandas as pd
import numpy as np
column = ['values']
data = [1,2,3,4, np.nan, 6, 7, 8, np.nan, np.nan, np.nan, 12, 13, 14, np.nan, np.nan, 17, 18]
DateList_ = pd.date_range(start='2018-10-29 10:00:00', end='2018-10-29 11:25:00 ', freq='5T')
df_ = pd.DataFrame(data=data, columns=column, index=DateList_)
First iteration: values = [1,2,3,4], nan_periods=1 -> Forecast [5] from values
Second iteration: values = [1,2,3,4,5,6,7,8] nan_periods=3 -> Forecast [9,10,11] from values
Third iteration: values = [1,2,3,4,5,6,7,8,9,10,11,12,13,14] -> Forecast [15,16] from values
Iteration stops, no more NaN values.
The forecasting is done with Prophet. I'd like to ask some help with the selection part, to iteratively select rows that contain values, and the following nan/s.
Unfortunately, I can't use interpolation! The real sequences I have are far more complex and not LINEAR like this dummy example! This example is only used to demonstrate the selection!
Upvotes: 1
Views: 427
Reputation: 862406
Idea is create groups with consecutive NaN
s in Series
g
, then loop by groupby
and get variables for forecast, for select by position is necessary use Index.get_loc
for position by first value of groups and for position from column name, then set values by loc
for replace NaN
s:
s = df_['values'].isna()
g = s.ne(s.shift()).cumsum()[s]
for i, x in df_.groupby(g):
nan_periods = len(x)
values = df_.iloc[:df_.index.get_loc(x.index[0]), df_.columns.get_loc('values')]
print (nan_periods)
print (values)
#sample data
Forecast = 10
df_.loc[x.index, 'values'] = Forecast
print (df_)
values
2018-10-29 10:00:00 1.0
2018-10-29 10:05:00 2.0
2018-10-29 10:10:00 3.0
2018-10-29 10:15:00 4.0
2018-10-29 10:20:00 10.0
2018-10-29 10:25:00 6.0
2018-10-29 10:30:00 7.0
2018-10-29 10:35:00 8.0
2018-10-29 10:40:00 10.0
2018-10-29 10:45:00 10.0
2018-10-29 10:50:00 10.0
2018-10-29 10:55:00 12.0
2018-10-29 11:00:00 13.0
2018-10-29 11:05:00 14.0
2018-10-29 11:10:00 10.0
2018-10-29 11:15:00 10.0
2018-10-29 11:20:00 17.0
2018-10-29 11:25:00 18.0
Upvotes: 1