Joe Emmens
Joe Emmens

Reputation: 147

How to group by day of the year and calculate the max value in pandas

I have time series data ranging from 2005-2014 with entries for each day of the year and the corresponding maximum temperature. I have created this series from a larger data set called data.

In[80]: data
Out[80]: 
            ID        Date Element  Data_Value
41334  USC00208080  2005-01-01    TMAX          33
55428  USC00207308  2005-01-01    TMIN         -44
32266  USC00208202  2005-01-01    TMAX         150
2073   USC00203712  2005-01-01    TMAX         144
55424  USC00207308  2005-01-01    TMAX         150

In[79]: max_temp
Out[79]: 
Date
2005-01-01    156
2005-01-02    139
2005-01-03    133
2005-01-04     39
2005-01-05     33

max_temp = data.groupby("Date")["Data_Value"].max()

As you can see this returns the max for each date for each year. What I want to do is find the maximum value for each day over the range 2005-2014

Desired result:

In[79]: max_temp
Out[79]: 
Date
01-01    # max for Jan 1st between 2005-2014
01-02    # max for Jan 2nd between 2005-2014
01-03    # max for Jan 3rd between 2005-2014
01-04    # max for Jan 4th ...
01-05    # max for Jan 5th ...

Upvotes: 1

Views: 2275

Answers (2)

y0j0
y0j0

Reputation: 3602

First of all I would suggest to split datetime to day, month and year.

data['year'] = data['Date'].dt.year
data['month'] = data['Date'].dt.month
data['day'] = data['Date'].dt.day 

Then you can group by only by day and month like:

max_temp = data.groupby(['month','day'])["Data_Value"].max()

Upvotes: 4

Serge Ballesta
Serge Ballesta

Reputation: 148910

First, the direct answer to your question. You need the dt accessor to get day or month from a datetime64 column:

max_temp = data.groupby([data["Date"].dt.month, data["Date"].dt.day])["Data_Value"].max()

That being said, on a meteorological point of view, using statistics on a single day of the year does not make sense, because the temperature (or other meteo measures) vary too much. Professionals use stats on full year, quarter, month or decade (1-10, 11-20, 21-end_of_month). Using a shorter period only add noise.

Upvotes: 0

Related Questions