Reputation: 59
I am looking to (unable to see this covered already anywhere) create a sliding window using numpy instead of pandas.rolling (primarily for speed). However, the sliding window must also be a function of minimum and maximum number of instances in the window, and return NaN when the window cannot be constructed. This is similar to pandas.rolling with arguments set for window size (maximum) and min_periods. For example:
Set Min_periods = 3 and Max_periods = 7, see below for example of intended window:
index values intended_window
0 10 np.nan
1 11 np.nan
2 12 [10,11,12]
3 13 [10,11,12,13]
4 14 [10,11,12,13,14]
5 15 [10,11,12,13,14,15]
6 16 [10,11,12,13,14,15,16]
7 17 [11,12,13,14,15,16,17]
8 18 [12,12,14,15,16,17,18]
9 19 [13,14,15,16,17,18,19]
I see examples of how this sliding window can be constructed when there is no maximum or minimum required for the sliding window e.g.
def rolling_window(a, window):
shp = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shp, strides=strides)
Does anyone know how I can expand this to return windows as in the example above?
Upvotes: 0
Views: 767
Reputation: 135
Please try the following.
def dataframe_striding(dataframe, window):
'''
Parameters
----------
dataframe : Input Dataframe, in this case df with columns ['index', 'values'] present.
window : Tuple denoting the window size.
Returns
-------
dataframe : Pandas Dataframe
'''
lower_end, upper_end = window
if lower_end > upper_end:
raise ValueError('Check window size!')
results = []
for i in range(dataframe.shape[0]):
l = [k for k in dataframe['values'][:i+1]]
if len(l) < lower_end: # checks for minimum window length
l = np.nan
results.append(l)
elif lower_end <= len(l) <= upper_end: # checks for required window length
results.append(l)
else: # checks for maximum window length
l = l[-upper_end:]
results.append(l)
dataframe['rolling_output'] = results # appends output to input dataframe
return dataframe
# run above function #
final_df = dataframe_striding(df, window = (4,6))
Upvotes: 1
Reputation: 630
values = np.linspace(1, 10, num=10)
window_column = []
for i in range(len(values)):
if i - 7 < 0:
t = 0
else:
t = i - 7
window = values[t:i]
if len(window) < 3:
window_column.append(np.nan)
else:
window_column.append(window)
Upvotes: 0