Reputation: 73
I have daily rainfall data that looks like the following:
Date Rainfall (mm)
1922-01-01 0.0
1922-01-02 0.0
1922-01-03 0.0
1922-01-04 0.0
1922-01-05 31.5
1922-01-06 0.0
1922-01-07 0.0
1922-01-08 0.0
1922-01-09 0.0
1922-01-10 0.0
1922-01-11 0.0
1922-01-12 9.1
1922-01-13 6.4
.
.
.
I am trying to work out the maximum value for each month for each year, and also what date the maximum value occurred on. I have been using the code:
rain_data.groupby(pd.Grouper(freq = 'M'))['Rainfall (mm)'].max()
This is returning the correct maximum value but returns the end date of each month rather than the date that maximum event occurred on.
1974-11-30 0.0
1974-12-31 0.0
1975-01-31 0.0
1975-02-28 65.0
1975-03-31 129.5
1975-11-30 59.9
1975-12-31 7.1
1976-01-31 10.0
1976-11-30 0.0
1976-12-31 0.0
1977-01-31 4.3
Any suggestions on how I could get the correct date?
Upvotes: 0
Views: 143
Reputation: 53
I'm new to this, but what I think you're doing in (pd.Grouper(freq = 'M'))
is grouping all the values in each month, but it's assigning every value within a group to the same date. I think this is why your groupby
isn't returning the dates you're looking for.
I think your question is answered here. Alexander suggests to use:
df.groupby(pd.TimeGrouper('M')).Close.agg({'max date': 'idxmax', 'max rainfall': np.max})
The agg
works without the Close
I think, so if it's problematic (as I found) you might want to take it out.
Upvotes: 1