Reputation: 2219
I have a df that I am grouping by two columns. I want to count each group sequentially. The code below counts each row within a group sequentially. This seems easier than I think but can't figure it out.
df = pd.DataFrame({
'Key': ['10003', '10009', '10009', '10009',
'10009', '10034', '10034', '10034'],
'Date1': [20120506, 20120506, 20120506, 20120506,
20120620, 20120206, 20120206, 20120405],
'Date2': [20120528, 20120507, 20120615, 20120629,
20120621, 20120305, 20120506, 20120506]
})
df['Count'] = df.groupby(['Key','Date1']).cumcount() + 1
Anticipated result:
Date1 Date2 Key Count
0 20120506 20120528 10003 1
1 20120506 20120507 10009 2
2 20120506 20120615 10009 2
3 20120506 20120629 10009 2
4 20120620 20120621 10009 3
5 20120206 20120305 10034 4
6 20120206 20120506 10034 4
7 20120405 20120506 10034 5
Upvotes: 5
Views: 348
Reputation: 294218
You can use pd.factorize
to labe unique values, which can be tuples
.
df['Count'] = pd.factorize(list(zip(df.Key, df.Date1)))[0] + 1
df
Date1 Date2 Key Count
0 20120506 20120528 10003 1
1 20120506 20120507 10009 2
2 20120506 20120615 10009 2
3 20120506 20120629 10009 2
4 20120620 20120621 10009 3
5 20120206 20120305 10034 4
6 20120206 20120506 10034 4
7 20120405 20120506 10034 5
Upvotes: 2
Reputation: 323226
Or maybe category
, pd.factorize
also work for it
(df['Key'].astype(str)+df['Date1'].astype(str)).astype('category').cat.codes.add(1)
Out[60]:
0 1
1 2
2 2
3 2
4 3
5 4
6 4
7 5
dtype: int8
Upvotes: 2
Reputation: 402283
You're looking for groupby
+ ngroup
:
df['Count'] = df.groupby(['Key','Date1']).ngroup() + 1
df
Date1 Date2 Key Count
0 20120506 20120528 10003 1
1 20120506 20120507 10009 2
2 20120506 20120615 10009 2
3 20120506 20120629 10009 2
4 20120620 20120621 10009 3
5 20120206 20120305 10034 4
6 20120206 20120506 10034 4
7 20120405 20120506 10034 5
ngroup
simply gives each group a label.
Upvotes: 5