Reputation: 4546
Is there any way to stop pandas.TimeGrouper()
from returning an incomplete group (ts1)? Currently I'm using the following to determine the number of incomplete group members and then using .ix
to remove these rows (ts2). I was wondering if there's a better (or built-in) way of doing this? This was the only pandas.TimeGrouper
documentation that I was able to find.
import pandas as pd
pd.__version__
Out [1]: '0.15.0'
rng = pd.date_range('1/1/2013', periods=365, freq='D')
random_numbers = arange(0, len(rng))
ts = pd.Series(random_numbers, index=rng)
num_days = 3
num_rows_to_drop = len(rng) % num_days
days = 'D'
timedelta_for_grouping = str(num_days) + days
ts1 = ts.groupby(pd.TimeGrouper(timedelta_for_grouping)).transform('median')
ts2 = ts.groupby(pd.TimeGrouper(timedelta_for_grouping)).transform('median').ix[:-num_rows_to_drop]
print ts1.tail(), ts2.tail()
Out [2]:
2013-12-27 361.0
2013-12-28 361.0
2013-12-29 361.0
2013-12-30 363.5
2013-12-31 363.5
Freq: D, dtype: float64
2013-12-25 358
2013-12-26 358
2013-12-27 361
2013-12-28 361
2013-12-29 361
Freq: D, dtype: float64
Upvotes: 3
Views: 787
Reputation: 128948
Easiest way is to filter the len of the groups (according to the minimum of the resample period)
In [47]: g = pd.TimeGrouper(timedelta_for_grouping)
In [48]: ts.groupby(g).filter(lambda x: len(x) >= 3).groupby(g).transform('median')
Out[48]:
2013-01-01 1
2013-01-02 1
2013-01-03 1
2013-01-04 4
2013-01-05 4
2013-01-06 4
2013-01-07 7
2013-01-08 7
2013-01-09 7
2013-01-10 10
2013-01-11 10
2013-01-12 10
2013-01-13 13
2013-01-14 13
2013-01-15 13
...
2013-12-15 349
2013-12-16 349
2013-12-17 349
2013-12-18 352
2013-12-19 352
2013-12-20 352
2013-12-21 355
2013-12-22 355
2013-12-23 355
2013-12-24 358
2013-12-25 358
2013-12-26 358
2013-12-27 361
2013-12-28 361
2013-12-29 361
Freq: D, Length: 363
Upvotes: 4