FooBar
FooBar

Reputation: 16478

Group by month-pairs

Let's say I have a pd.DataFrame where the index is given by pd.DateTimeIndex.

I would like to now group pairs monthly, i.e. get all the pairs. That is, I would like to have one group with (date in (January, February)), and then (date in (February, March)), etc. The natural way should be pd.TimeGrouper, but I couldn't find a way to make that work.

Here is a sample data set. Note that is more than one observation per month (numbers vary over time), such that pd.rolling() and pd.rolling_apply() appear not to be valid alternatives:

            year  month          cpsidp
date                                   
2000-01-01  2000      1  19981003169301
2000-02-01  2000      2  20000200000101
2000-02-01  2000      2  20000200000102
2000-02-01  2000      2  20000200000103
2000-02-01  2000      2  20000200000104
2000-02-01  2000      2  20000200000105
2000-03-01  2000      3  19981203124802
2000-04-01  2000      4  20000400000101
2000-05-01  2000      5  19990200854301
2000-06-01  2000      6  19990300018604
2000-07-01  2000      7  20000400000101
2000-08-01  2000      8  19990502683801
2000-09-01  2000      9  19990600006901
2000-10-01  2000     10  19990700006501
2000-11-01  2000     11  19990800083001
2000-12-01  2000     12  19991100000301
2001-01-01  2001      1  19991100000301
2001-02-01  2001      2  19991100002701
2001-03-01  2001      3  20000205949101
2001-04-01  2001      4  20010100107701
2001-05-01  2001      5  20000204516501
2001-06-01  2001      6  20000300112801
2001-07-01  2001      7  20000400000101
2001-08-01  2001      8  20000505217801

Here's how I would create the groups in a very un-pandas un-fancy way:

    dates = df.index.unique()
    for i, date in enumerate(dates):
        if i == len(dates) - 1:
            # last group: no next-group, break
            break
        date1, date2 = date, dates[i+1]
        group = pd.concat((df.loc[date1], df.loc[date2]), axis=0)
        print(group)

Upvotes: 3

Views: 175

Answers (2)

piRSquared
piRSquared

Reputation: 294258

It's unfortunate that rolling('2M') doesn't work. Anyway, your main reason for not being able to use rolling(2) is that you have more than one observation per month. Depending on how you want to aggregate per month pair, you could first aggregate every month (giving yourself one observation per month) then use rolling(2)

df.groupby(pd.TimeGrouper('M')).first().rolling(2).mean()

enter image description here


This works nicely for man aggregate stats like 'max', 'min', 'sum', 'count', 'size', 'first', 'last'.

'mean', 'std', ... etc. would take special care. Like, you'd have to use 'sum' and 'count' and caluculate 'mean' and 'std' yourself but that could be done.

Upvotes: 3

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Try this:

In [171]: (df.assign(m1=df.index.year*10**2+df.index.month, m2=df.index.year*10**2+df.index.month+1)
     ...:    .groupby(['m1', 'm2'])
     ...:    .agg({'month':['min','max','size']})
     ...: )
     ...:
Out[171]:
              month
                min max size
m1     m2
200001 200002     1   1    1
200002 200003     2   2    5
200003 200004     3   3    1
200004 200005     4   4    1
200005 200006     5   5    1
200006 200007     6   6    1
200007 200008     7   7    1
200008 200009     8   8    1
200009 200010     9   9    1
200010 200011    10  10    1
200011 200012    11  11    1
200012 200013    12  12    1
200101 200102     1   1    1
200102 200103     2   2    1
200103 200104     3   3    1
200104 200105     4   4    1
200105 200106     5   5    1
200106 200107     6   6    1
200107 200108     7   7    1
200108 200109     8   8    1

Upvotes: 3

Related Questions