Reputation: 49
I have a 40 year time series in the format stn;yyyymmddhh;rainfall , where yyyy= year, mm = month, dd= day,hh= hour. The series is at an hourly resolution. I extracted the maximum values for each year by the following groupby method:
import pandas as pd
df = pd.read_csv('data.txt', delimiter = ";")
df['yyyy'] = df['yyyymmhhdd'].astype(str).str[:4]
df.groupby(['yyyy'])['rainfall'].max().reset_index()
Now, i am trying to extract the maximum values for 3 hour duration each year. I tried this sliding maxima approach but it is not working. k is the duration I am interested in. In simple words,i need maximum precipitation sum for multiple durations in every year (eg 3h, 6h, etc)
class AMS:
def sliding_max(self, k, data):
tp = data.values
period = 24*365
agg_values = []
start_j = 1
end_j = k*int(np.floor(period/k))
for j in range(start_j, end_j + 1):
start_i = j - 1
end_i = j + k + 1
agg_values.append(np.nansum(tp[start_i:end_i]))
self.sliding_max = max(agg_values)
return self.sliding_max
Any suggestions or improvements in my code or is there a way i can implement it with groupby. I am a bit new to python environment, so please excuse if the question isn't put properly.
Stn;yyyymmddhh;rainfall
xyz;1981010100;0.0
xyz;1981010101;0.0
xyz;1981010102;0.0
xyz;1981010103;0.0
xyz;1981010104;0.0
xyz;1981010105;0.0
xyz;1981010106;0.0
xyz;1981010107;0.0
xyz;1981010108;0.0
xyz;1981010109;0.4
xyz;1981010110;0.6
xyz;1981010111;0.1
xyz;1981010112;0.1
xyz;1981010113;0.0
xyz;1981010114;0.1
xyz;1981010115;0.6
xyz;1981010116;0.0
xyz;1981010117;0.0
xyz;1981010118;0.2
xyz;1981010119;0.0
xyz;1981010120;0.0
xyz;1981010121;0.0
xyz;1981010122;0.0
xyz;1981010123;0.0
xyz;1981010200;0.0
Upvotes: 0
Views: 445
Reputation: 703
You first have to convert your column containing the datetimes to a Series
of type datetime
. You can do that parsing by providing the format of your datetimes.
df["yyyymmddhh"] = pd.to_datetime(df["yyyymmddhh"], format="%Y%M%d%H")
After having the correct data type you have to set that column as your index and can now use pandas
functionality for time series data (resampling in your case).
First you resample the data to 3 hour windows and sum the values. From that you resample to yearly data and take the maximum value of all the 3 hour windows for each year.
df.set_index("yyyymmddhh").resample("3H").sum().resample("Y").max()
# Output
yyyymmddhh rainfall
1981-12-31 1.1
Upvotes: 1