Reputation: 21
I have a dataset that spans many years. There are multiple data entries for every day of the year. I will like to groupby the day of the year. i.e day-month, then use agg() to find the max value so that I am left with 365 rows.
My method is to create a new column that takes the index which is in the format yyyy-mm-dd and strip the year such that it becomes mm-dd and use groupby on this column. I know that will work.
But I have converted the index into a datetime object using pd.to_datetime(df.index). I want to learn how to work with the datetime index and I believe there is a way to use the datetime object index to do this in a clean way. Please show me a clean way :). Thank you
df.groupby([df.index.month, df.index.day]).agg({"Data_Value":np.max})
The output is multi index and I do not want it that way
Upvotes: 0
Views: 48
Reputation: 107767
Since your data is a time series, consider Data frame.resample
:
df[["Data_Value"]].resample("1D").max()
Upvotes: 0
Reputation: 35623
Use strftime
on the date information so you get "mm-dd" values, then aggregate by that:
df.groupby(df.index.strftime('%m-%d')).agg({"Data_Value":np.max})
Upvotes: 0