datam
datam

Reputation: 285

Easiest way to get Pandas rolling window of values

I have a dataset. I want a window of 5 values. Does pandas have a native function that will give me a rolling window of 5 values until there are no longer 5 values that it can use? I want these to be rows. I also want the new label to be the middle of the 5 values.

Input DataFrame

first   label
0   1   0
1   2   1
2   3   2
3   4   3
4   5   4
5   6   5

Output DataFrame desired:

first   label
0   [1, 2, 3, 4, 5] 2
1   [2, 3, 4, 5, 6] 3

I have tried using the .rolling function and haven't been successful.

Upvotes: 0

Views: 886

Answers (2)

sammywemmy
sammywemmy

Reputation: 28729

An alternative, more of a hack, I don't think pandas has a native function for what you want.

Convert dataframe to numpy, transpose dataframe and pull out labels and array, using a list comprehension:

 M = df.to_numpy().T

 outcome = [(M[0,i:5+i],
             M[1][(5+i)//2])
             for i in range(0,M.shape[1])
             if 5+i <=M.shape[1]
            ]

 print(outcome)

 [(array([1, 2, 3, 4, 5]), 2), (array([2, 3, 4, 5, 6]), 3)]

 pd.DataFrame(outcome,columns=['first','label'])

      first           label
 0  [1, 2, 3, 4, 5]     2
 1  [2, 3, 4, 5, 6]     3

Upvotes: 0

jezrael
jezrael

Reputation: 863701

You can use strides and for label get position of middle value and by numpy indexing set value:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = rolling_window(df['first'].to_numpy(), 5)
print (a)
[[ 1 2 3  4  5]
 [2 3  4  5 6]]

#get positions of middle value
i = rolling_window(np.arange(len(df)), 5)[:, 2]
print (i)
[2 3]

df = pd.DataFrame({'first':a.tolist(), 
                   'label': df['label'].to_numpy()[i]})
print (df)
             first  label
0   [1, 2, 3, 4, 5]     2
1   [2, 3, 4, 5, 6]     3

You can more optimalize code for run strides only one:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)


#get positions
idx = rolling_window(np.arange(len(df)), 5)
print (idx)
[[0 1 2 3 4]
 [1 2 3 4 5]]

df = pd.DataFrame({'first': df['first'].to_numpy()[idx].tolist(), 
                   'label': df['label'].to_numpy()[idx][:, 2]})
print (df)

             first  label
0  [1, 2, 3, 4, 5]      2
1  [2, 3, 4, 5, 6]      3

Upvotes: 1

Related Questions