Anakin Skywalker
Anakin Skywalker

Reputation: 2520

Pandas groupby month and year (date as datetime64[ns]) and summarized by count

I have a data frame, which I created in pandas, grouping by date and summarizing by rides.

      date   rides
0   2019-01-01  247279
1   2019-01-02  585996
2   2019-01-03  660631
3   2019-01-04  662011
4   2019-01-05  440848
..         ...     ...
451 2020-03-27  218499
452 2020-03-28  143305
453 2020-03-29  110833
454 2020-03-30  207743
455 2020-03-31  199623

[456 rows x 2 columns]

My date column is in datetime64[ns].

date     datetime64[ns]
rides             int64
dtype: object

Now I would like to create another data frame, grouping by month and year (I have data form 2019 and 2020) and summarize by rides.

Ideal output:

Year Month   Rides
2019 January 2000000
2020 March   1000000

Upvotes: 7

Views: 10895

Answers (2)

ALollz
ALollz

Reputation: 59529

datetime also support the to_period conversion, so we can group everything by a Monthly period.

df.groupby(df.date.dt.to_period('M')).agg('sum')
#           rides
#date            
#2019-01  2596765
#2020-03   880003

In this case the index is a PeriodIndex, which has many of the same datetime attributes.

PeriodIndex(['2019-01', '2020-03'], dtype='period[M]', name='date', freq='M')

Upvotes: 7

Ben.T
Ben.T

Reputation: 29635

you can groupby and get the dt.year and the dt.month_name from the column date.

print (df.groupby([df['date'].dt.year.rename('year'), 
                   df['date'].dt.month_name().rename('month')])
         ['rides'].sum().reset_index())
   year    month    rides
0  2019  January  2596765
1  2020    March   880003

Upvotes: 11

Related Questions