Reputation: 16478
Let's say I have a pd.DataFrame
where the index is given by pd.DateTimeIndex
.
I would like to now group pairs monthly, i.e. get all the pairs. That is, I would like to have one group with (date in (January, February)
), and then (date in (February, March)
), etc. The natural way should be pd.TimeGrouper
, but I couldn't find a way to make that work.
Here is a sample data set. Note that is more than one observation per month (numbers vary over time), such that pd.rolling()
and pd.rolling_apply()
appear not to be valid alternatives:
year month cpsidp
date
2000-01-01 2000 1 19981003169301
2000-02-01 2000 2 20000200000101
2000-02-01 2000 2 20000200000102
2000-02-01 2000 2 20000200000103
2000-02-01 2000 2 20000200000104
2000-02-01 2000 2 20000200000105
2000-03-01 2000 3 19981203124802
2000-04-01 2000 4 20000400000101
2000-05-01 2000 5 19990200854301
2000-06-01 2000 6 19990300018604
2000-07-01 2000 7 20000400000101
2000-08-01 2000 8 19990502683801
2000-09-01 2000 9 19990600006901
2000-10-01 2000 10 19990700006501
2000-11-01 2000 11 19990800083001
2000-12-01 2000 12 19991100000301
2001-01-01 2001 1 19991100000301
2001-02-01 2001 2 19991100002701
2001-03-01 2001 3 20000205949101
2001-04-01 2001 4 20010100107701
2001-05-01 2001 5 20000204516501
2001-06-01 2001 6 20000300112801
2001-07-01 2001 7 20000400000101
2001-08-01 2001 8 20000505217801
Here's how I would create the groups in a very un-pandas un-fancy way:
dates = df.index.unique()
for i, date in enumerate(dates):
if i == len(dates) - 1:
# last group: no next-group, break
break
date1, date2 = date, dates[i+1]
group = pd.concat((df.loc[date1], df.loc[date2]), axis=0)
print(group)
Upvotes: 3
Views: 175
Reputation: 294258
It's unfortunate that rolling('2M')
doesn't work. Anyway, your main reason for not being able to use rolling(2)
is that you have more than one observation per month. Depending on how you want to aggregate per month pair, you could first aggregate every month (giving yourself one observation per month) then use rolling(2)
df.groupby(pd.TimeGrouper('M')).first().rolling(2).mean()
This works nicely for man aggregate stats like 'max'
, 'min'
, 'sum'
, 'count'
, 'size'
, 'first'
, 'last'
.
'mean'
, 'std'
, ... etc. would take special care. Like, you'd have to use 'sum'
and 'count'
and caluculate 'mean'
and 'std'
yourself but that could be done.
Upvotes: 3
Reputation: 210832
Try this:
In [171]: (df.assign(m1=df.index.year*10**2+df.index.month, m2=df.index.year*10**2+df.index.month+1)
...: .groupby(['m1', 'm2'])
...: .agg({'month':['min','max','size']})
...: )
...:
Out[171]:
month
min max size
m1 m2
200001 200002 1 1 1
200002 200003 2 2 5
200003 200004 3 3 1
200004 200005 4 4 1
200005 200006 5 5 1
200006 200007 6 6 1
200007 200008 7 7 1
200008 200009 8 8 1
200009 200010 9 9 1
200010 200011 10 10 1
200011 200012 11 11 1
200012 200013 12 12 1
200101 200102 1 1 1
200102 200103 2 2 1
200103 200104 3 3 1
200104 200105 4 4 1
200105 200106 5 5 1
200106 200107 6 6 1
200107 200108 7 7 1
200108 200109 8 8 1
Upvotes: 3