Reputation: 73
I have a dataframe like below:
I want the count of occurrences of a column on daily basis
new_df.head(5)
Out[165]:
grid_id date_str
0 173159 2019-07-19
1 173861 2019-07-19
2 173159 2019-07-19
3 173860 2019-07-19
4 172460 2019-07-19
0 173159 2019-07-20
Expected output is:
grid_id date_str grid_id_count
0 173159 2019-07-19 2
0 173159 2019-07-20 1
1 173861 2019-07-19 1
3 173860 2019-07-19 1
4 172460 2019-07-19 1
Upvotes: 1
Views: 40
Reputation: 10960
Groupby all columns creating unique rows and then get the size of each group and then reset the index to get back the dataframe.
>>> df.groupby(['grid_id', 'date_str']).size().reset_index(name='count')
grid_id date_str count
0 172460 2019-07-19 1
1 173159 2019-07-19 2
2 173159 2019-07-20 1
3 173860 2019-07-19 1
4 173861 2019-07-19 1
Upvotes: 0
Reputation: 75080
You can use df.drop_duplicates
to drop dupes based on both the cols, then groupby
on grid_id
and do a cumcount
with ascending=False
and assign
it to a new column:
m=df.drop_duplicates(['grid_id','date_str'])
final=(m.assign(grid_id_count=m.groupby('grid_id')
.cumcount(ascending=False).add(1)).sort_index())
grid_id date_str grid_id_count
0 173159 2019-07-19 2
0 173159 2019-07-20 1
1 173861 2019-07-19 1
3 173860 2019-07-19 1
4 172460 2019-07-19 1
Upvotes: 1