Guy s
Guy s

Reputation: 1806

Pandas - Counting the number of days for group by

I want to count the number of days after grouping by 2 columns:

groups = df.groupby([df.col1,df.col2])

Now i want to count the number of days relevant for each group:

result = groups['date_time'].dt.date.nunique()

I'm using something similar when I want to group by day, but here I get an error:

AttributeError: Cannot access attribute 'dt' of 'SeriesGroupBy' objects, try using the 'apply' method

What is the proper way to get the number of days?

Upvotes: 2

Views: 2675

Answers (1)

jezrael
jezrael

Reputation: 862471

You need another variation of groupby - define column first:

df['date_time'].dt.date.groupby([df.col1,df.col2]).nunique()

df.groupby(['col1','col2'])['date_time'].apply(lambda x: x.dt.date.nunique())

df['date_time1'] = df['date_time'].dt.date
a = df.groupby([df.col1,df.col2]).date_time1.nunique()

Sample:

start = pd.to_datetime('2015-02-24')
rng = pd.date_range(start, periods=10, freq='15H')

df = pd.DataFrame({'date_time': rng, 'col1': [0]*5 + [1]*5, 'col2': [2]*3 + [3]*4+ [4]*3})  
print (df)
   col1  col2           date_time
0     0     2 2015-02-24 00:00:00
1     0     2 2015-02-24 15:00:00
2     0     2 2015-02-25 06:00:00
3     0     3 2015-02-25 21:00:00
4     0     3 2015-02-26 12:00:00
5     1     3 2015-02-27 03:00:00
6     1     3 2015-02-27 18:00:00
7     1     4 2015-02-28 09:00:00
8     1     4 2015-03-01 00:00:00
9     1     4 2015-03-01 15:00:00
#solution with apply
df1 = df.groupby(['col1','col2'])['date_time'].apply(lambda x: x.dt.date.nunique())
print (df1)
col1  col2
0     2       2
      3       2
1     3       1
      4       2
Name: date_time, dtype: int64

#create new helper column
df['date_time1'] = df['date_time'].dt.date
df2 = df.groupby([df.col1,df.col2]).date_time1.nunique()
print (df2)
col1  col2
0     2       2
      3       2
1     3       1
      4       2
Name: date_time, dtype: int64

df3 = df['date_time'].dt.date.groupby([df.col1,df.col2]).nunique()
print (df3)
col1  col2
0     2       2
      3       2
1     3       1
      4       2
Name: date_time, dtype: int64

Upvotes: 3

Related Questions