ℕʘʘḆḽḘ
ℕʘʘḆḽḘ

Reputation: 19395

rolling sum over the last 2 seconds

Consider this simple example:

df = pd.DataFrame({'mytime' : [pd.to_datetime('2018-01-01 14:34:12.340'),
                             pd.to_datetime('2018-01-01 14:34:13.0'),
                             pd.to_datetime('2018-01-01 14:34:15.342'),
                             pd.to_datetime('2018-01-01 14:34:16.42'),
                             pd.to_datetime('2018-01-01 14:34:28.742')],
                    'myvalue' : [1,2,np.NaN,3,1],
                    'mychart' : ['a','b','c','d','e']})

df.set_index('mytime', inplace = True)
df
Out[142]: 
                        mychart  myvalue
mytime                                  
2018-01-01 14:34:12.340       a      1.0
2018-01-01 14:34:13.000       b      2.0
2018-01-01 14:34:15.342       c      NaN
2018-01-01 14:34:16.420       d      3.0
2018-01-01 14:34:28.742       e      1.0

Here I want to use rolling to compute the rolling sum of myvalue over the last 2 seconds.

Yes, the last two seconds, not the last two observations :)

This is supposed to work, but the two similar calls give different results

df['myrol1'] = df.myvalue.rolling(window = '2s', closed = 'right').apply(lambda x: x.sum())
df['myrol2'] = df.myvalue.rolling(window = '2s', closed = 'right').sum()

df
Out[152]: 
                        mychart  myvalue  myrol1  myrol2
mytime                                                  
2018-01-01 14:34:12.340       a      1.0     1.0     1.0
2018-01-01 14:34:13.000       b      2.0     3.0     3.0
2018-01-01 14:34:15.342       c      NaN     NaN     NaN
2018-01-01 14:34:16.420       d      3.0     NaN     3.0
2018-01-01 14:34:28.742       e      1.0     1.0     1.0

What is going on with apply here? Anything using apply seems to be buggy here. For instance :

df.mychart.rolling(window = '2s', closed = 'right').apply(lambda x: ' '.join(x))
Out[160]: 
mytime
2018-01-01 14:34:12.340    a
2018-01-01 14:34:13.000    b
2018-01-01 14:34:15.342    c
2018-01-01 14:34:16.420    d
2018-01-01 14:34:28.742    e
Name: mychart, dtype: object

Thanks!

Upvotes: 2

Views: 240

Answers (1)

BENY
BENY

Reputation: 323316

You may need to check np.nansum

df.myvalue.rolling(window = '2s', closed = 'right').apply(lambda x: np.nansum(x))
Out[248]: 
mytime
2018-01-01 14:34:12.340    1.0
2018-01-01 14:34:13.000    3.0
2018-01-01 14:34:15.342    NaN
2018-01-01 14:34:16.420    3.0
2018-01-01 14:34:28.742    1.0
Name: myvalue, dtype: float64

Since you have NaN in original values and simple sum will return NaN

np.sum([0.5, np.nan])
Out[249]: nan
np.nansum([0.5, np.nan])
Out[250]: 0.5

Upvotes: 2

Related Questions