Dave Will
Dave Will

Reputation: 95

Groupby with sum over columns to get frequency of each item using pandas

I am trying to get the frequency of days grouped by its ID. So I would like to know:

  1. The total number of days for each ID and
  2. the frequency of each day per ID.

So it should be look like this:

my data

On top of that I tried the following:

number = data.groupby(by=['ID, Date_Time']).size().unstack(fill_value=0) 

but this on gives me a different data frame that I can't add it to the original on "data".

Upvotes: 0

Views: 998

Answers (1)

Ruli
Ruli

Reputation: 2780

I have created a simple data sample as you have only provided a screenshot, you need to convert the date to datetime format and then group according to id and pd.Grouper according to distinct days (D). For total all you need is groupby and transform the count of each group.

data = [[1, "2010-01-01 12:21:46"],[1, "2010-01-01 12:22:46"],[1, "2010-04-02 12:21:46"],[1, "2010-06-06 12:21:46"],[2, "2010-06-06 12:21:46"],[2, "2010-06-06 19:21:46"],[2, "2010-09-06 12:21:46"]]

import pandas as pd 

df = pd.DataFrame(data,columns=("id","date_"))
df['date_']= pd.to_datetime(df['date_'],format='%Y-%m-%d %H:%M:%S')
df['freq'] = df.groupby(['id', pd.Grouper(key='date_', freq='D')])['id'].transform('count')
df['total'] = df.groupby('id')['id'].transform('count')

Final ouptput will look like this:

   id               date_  freq  total
0   1 2010-01-01 12:21:46     2      4
1   1 2010-01-01 12:22:46     2      4
2   1 2010-04-02 12:21:46     1      4
3   1 2010-06-06 12:21:46     1      4
4   2 2010-06-06 12:21:46     2      3
5   2 2010-06-06 19:21:46     2      3
6   2 2010-09-06 12:21:46     1      3

Upvotes: 3

Related Questions