Bennett_Eleven
Bennett_Eleven

Reputation: 21

How to use groupby() to group data that spans many years by the day and month?

I have a dataset that spans many years. There are multiple data entries for every day of the year. I will like to groupby the day of the year. i.e day-month, then use agg() to find the max value so that I am left with 365 rows.

My method is to create a new column that takes the index which is in the format yyyy-mm-dd and strip the year such that it becomes mm-dd and use groupby on this column. I know that will work.

But I have converted the index into a datetime object using pd.to_datetime(df.index). I want to learn how to work with the datetime index and I believe there is a way to use the datetime object index to do this in a clean way. Please show me a clean way :). Thank you

df.groupby([df.index.month, df.index.day]).agg({"Data_Value":np.max})

The output is multi index and I do not want it that way

Upvotes: 0

Views: 48

Answers (2)

Parfait
Parfait

Reputation: 107767

Since your data is a time series, consider Data frame.resample:

df[["Data_Value"]].resample("1D").max()

Upvotes: 0

Paul Maxwell
Paul Maxwell

Reputation: 35623

Use strftime on the date information so you get "mm-dd" values, then aggregate by that:

df.groupby(df.index.strftime('%m-%d')).agg({"Data_Value":np.max})

Upvotes: 0

Related Questions