mikeL
mikeL

Reputation: 1114

groupby DataFrame with new column representing the group

I have a DataFrame with a timestamp column

d1=DataFrame({'a':[datetime(2015,1,1,20,2,1),datetime(2015,1,1,20,14,58),
datetime(2015,1,1,20,17,5),datetime(2015,1,1,20,31,5),
datetime(2015,1,1,20,34,28),datetime(2015,1,1,20,37,51),datetime(2015,1,1,20,41,19),
datetime(2015,1,1,20,49,4),datetime(2015,1,1,20,59,21)], 'b':[2,4,26,22,45,3,8,121,34]})


          a              b
0 2015-01-01 20:02:01    2
1 2015-01-01 20:14:58    4
2 2015-01-01 20:17:05   26
3 2015-01-01 20:31:05   22
4 2015-01-01 20:34:28   45
5 2015-01-01 20:37:51    3
6 2015-01-01 20:41:19    8
7 2015-01-01 20:49:04  121
8 2015-01-01 20:59:21   34

I can group by 15 minute intervals by doing these operations

d2=d1.set_index('a')

d3=d2.groupby(pd.TimeGrouper('15Min'))

The number of rows by group is found by

d3.size()

a
2015-01-01 20:00:00    2
2015-01-01 20:15:00    1
2015-01-01 20:30:00    4
2015-01-01 20:45:00    2

I want my original DataFrame to have a column corresponding to the unique number of rows in the specific group that it belongs to. For example, the first group

2015-01-01 20:00:00 

has 2 rows so the first two rows of my new column in d1 should have the number 1

the second group

2015-01-01 20:15:00 

has 1 row so the third row of my new column in d1 should have the number 2

the third group

2015-01-01 20:15:00 

has 4 rows so the fourth, fifth, sixth, and seventh rows of my new column in d1 should have the number 3

I want my new DataFrame to look like this

          a              b   c
0 2015-01-01 20:02:01    2   1
1 2015-01-01 20:14:58    4   1
2 2015-01-01 20:17:05   26   2
3 2015-01-01 20:31:05   22   3
4 2015-01-01 20:34:28   45   3
5 2015-01-01 20:37:51    3   3
6 2015-01-01 20:41:19    8   3
7 2015-01-01 20:49:04  121   4
8 2015-01-01 20:59:21   34   4

Upvotes: 0

Views: 33

Answers (1)

Alicia Garcia-Raboso
Alicia Garcia-Raboso

Reputation: 13923

Use .transform() on your groupby object with an itertools.count iterator:

from datetime import datetime
from itertools import count
import pandas as pd

d1 = pd.DataFrame({'a': [datetime(2015,1,1,20,2,1), datetime(2015,1,1,20,14,58),
                         datetime(2015,1,1,20,17,5), datetime(2015,1,1,20,31,5),
                         datetime(2015,1,1,20,34,28), datetime(2015,1,1,20,37,51),
                         datetime(2015,1,1,20,41,19), datetime(2015,1,1,20,49,4),
                         datetime(2015,1,1,20,59,21)],
                   'b': [2, 4, 26, 22, 45, 3, 8, 121, 34]})
d2 = d1.set_index('a')

counter = count(1)
d2['c'] = (d2.groupby(pd.TimeGrouper('15Min'))['b']
             .transform(lambda x: next(counter)))
print(d2)

Output:

                       b  c
a                          
2015-01-01 20:02:01    2  1
2015-01-01 20:14:58    4  1
2015-01-01 20:17:05   26  2
2015-01-01 20:31:05   22  3
2015-01-01 20:34:28   45  3
2015-01-01 20:37:51    3  3
2015-01-01 20:41:19    8  3
2015-01-01 20:49:04  121  4
2015-01-01 20:59:21   34  4

Upvotes: 1

Related Questions