Reputation: 185
So I have this kind of data. Basically there are bunch of people do something and we keep track of them. So I am interested in one of the activities, i.e running. ID - person's id, date - date, DONE - Yes if they run that day. I want to find out in certain periods how many times people did that activity.
ID DATE DONE
1 2009-1-3 Yes
1 2009-1-15 Yes
2 2009-9-3 Yes
3 2009-10-7 No
1 2009-12-1 Yes
3 2009-10-8 Yes
2 2009-10-21 Yes
I want something like this, not necessarily exactly like this.
ID PERIOD COUNT
1 2009-1 2
2009-12 1
2 2009-9 1
2009-10 1
3 2009-10 1
Thanks in advance!
Upvotes: 1
Views: 197
Reputation: 863156
If need count also No
values like 0
aggregate sum
with boolean mask, for month periods use Series.dt.to_period
:
df1 = ((df['DONE'] == 'Yes')
.groupby([df['ID'], pd.to_datetime(df['DATE']).dt.to_period('m')])
.sum()
.astype(int)
.reset_index(name='COUNT'))
print (df1)
ID DATE COUNT
0 1 2009-01 2
1 1 2009-12 1
2 2 2009-09 1
3 2 2009-10 1
4 3 2009-10 1
Or you can first filter only rows with Yes
and aggregate counts by GroupBy.size
:
df1 = (df[df['DONE'] == 'Yes']
.groupby(['ID', pd.to_datetime(df['DATE']).dt.to_period('m')])
.size()
.reset_index(name='COUNT'))
print (df1)
ID DATE COUNT
0 1 2009-01 2
1 1 2009-12 1
2 2 2009-09 1
3 2 2009-10 1
4 3 2009-10 1
Upvotes: 1