Reputation: 1114
I have a DataFrame with a timestamp column
d1=DataFrame({'a':[datetime(2015,1,1,20,2,1),datetime(2015,1,1,20,14,58),
datetime(2015,1,1,20,17,5),datetime(2015,1,1,20,31,5),
datetime(2015,1,1,20,34,28),datetime(2015,1,1,20,37,51),datetime(2015,1,1,20,41,19),
datetime(2015,1,1,20,49,4),datetime(2015,1,1,20,59,21)], 'b':[2,4,26,22,45,3,8,121,34]})
a b
0 2015-01-01 20:02:01 2
1 2015-01-01 20:14:58 4
2 2015-01-01 20:17:05 26
3 2015-01-01 20:31:05 22
4 2015-01-01 20:34:28 45
5 2015-01-01 20:37:51 3
6 2015-01-01 20:41:19 8
7 2015-01-01 20:49:04 121
8 2015-01-01 20:59:21 34
I can group by 15 minute intervals by doing these operations
d2=d1.set_index('a')
d3=d2.groupby(pd.TimeGrouper('15Min'))
The number of rows by group is found by
d3.size()
a
2015-01-01 20:00:00 2
2015-01-01 20:15:00 1
2015-01-01 20:30:00 4
2015-01-01 20:45:00 2
I want my original DataFrame to have a column corresponding to the unique number of rows in the specific group that it belongs to. For example, the first group
2015-01-01 20:00:00
has 2 rows so the first two rows of my new column in d1 should have the number 1
the second group
2015-01-01 20:15:00
has 1 row so the third row of my new column in d1 should have the number 2
the third group
2015-01-01 20:15:00
has 4 rows so the fourth, fifth, sixth, and seventh rows of my new column in d1 should have the number 3
I want my new DataFrame to look like this
a b c
0 2015-01-01 20:02:01 2 1
1 2015-01-01 20:14:58 4 1
2 2015-01-01 20:17:05 26 2
3 2015-01-01 20:31:05 22 3
4 2015-01-01 20:34:28 45 3
5 2015-01-01 20:37:51 3 3
6 2015-01-01 20:41:19 8 3
7 2015-01-01 20:49:04 121 4
8 2015-01-01 20:59:21 34 4
Upvotes: 0
Views: 33
Reputation: 13923
Use .transform()
on your groupby
object with an itertools.count
iterator:
from datetime import datetime
from itertools import count
import pandas as pd
d1 = pd.DataFrame({'a': [datetime(2015,1,1,20,2,1), datetime(2015,1,1,20,14,58),
datetime(2015,1,1,20,17,5), datetime(2015,1,1,20,31,5),
datetime(2015,1,1,20,34,28), datetime(2015,1,1,20,37,51),
datetime(2015,1,1,20,41,19), datetime(2015,1,1,20,49,4),
datetime(2015,1,1,20,59,21)],
'b': [2, 4, 26, 22, 45, 3, 8, 121, 34]})
d2 = d1.set_index('a')
counter = count(1)
d2['c'] = (d2.groupby(pd.TimeGrouper('15Min'))['b']
.transform(lambda x: next(counter)))
print(d2)
Output:
b c
a
2015-01-01 20:02:01 2 1
2015-01-01 20:14:58 4 1
2015-01-01 20:17:05 26 2
2015-01-01 20:31:05 22 3
2015-01-01 20:34:28 45 3
2015-01-01 20:37:51 3 3
2015-01-01 20:41:19 8 3
2015-01-01 20:49:04 121 4
2015-01-01 20:59:21 34 4
Upvotes: 1