pik
pik

Reputation: 73

finding count of a column based on the number of occurences

I have a dataframe like below:

I want the count of occurrences of a column on daily basis

new_df.head(5)
Out[165]: 
   grid_id    date_str
0   173159  2019-07-19
1   173861  2019-07-19
2   173159  2019-07-19
3   173860  2019-07-19
4   172460  2019-07-19
0   173159  2019-07-20

Expected output is:

grid_id    date_str grid_id_count
0   173159  2019-07-19 2
0   173159  2019-07-20 1
1   173861  2019-07-19 1
3   173860  2019-07-19 1
4   172460  2019-07-19 1

Upvotes: 1

Views: 40

Answers (2)

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

Groupby all columns creating unique rows and then get the size of each group and then reset the index to get back the dataframe.

>>> df.groupby(['grid_id', 'date_str']).size().reset_index(name='count')
   grid_id    date_str  count
0   172460  2019-07-19  1
1   173159  2019-07-19  2
2   173159  2019-07-20  1
3   173860  2019-07-19  1
4   173861  2019-07-19  1

Upvotes: 0

anky
anky

Reputation: 75080

You can use df.drop_duplicates to drop dupes based on both the cols, then groupby on grid_id and do a cumcount with ascending=False and assign it to a new column:

m=df.drop_duplicates(['grid_id','date_str']) 
final=(m.assign(grid_id_count=m.groupby('grid_id')
          .cumcount(ascending=False).add(1)).sort_index())

   grid_id    date_str  grid_id_count
0   173159  2019-07-19              2
0   173159  2019-07-20              1
1   173861  2019-07-19              1
3   173860  2019-07-19              1
4   172460  2019-07-19              1

Upvotes: 1

Related Questions