Dawar
Dawar

Reputation: 49

Resampling a time series

I have a 40 year time series in the format stn;yyyymmddhh;rainfall , where yyyy= year, mm = month, dd= day,hh= hour. The series is at an hourly resolution. I extracted the maximum values for each year by the following groupby method:

import pandas as pd
df = pd.read_csv('data.txt', delimiter = ";")
df['yyyy'] = df['yyyymmhhdd'].astype(str).str[:4]
df.groupby(['yyyy'])['rainfall'].max().reset_index()

Now, i am trying to extract the maximum values for 3 hour duration each year. I tried this sliding maxima approach but it is not working. k is the duration I am interested in. In simple words,i need maximum precipitation sum for multiple durations in every year (eg 3h, 6h, etc)

class AMS:
    def sliding_max(self, k, data):
        tp = data.values
        period = 24*365
        agg_values = []
        start_j = 1
        end_j = k*int(np.floor(period/k))
        for j in range(start_j, end_j + 1):
            start_i = j - 1
            end_i = j + k + 1
            agg_values.append(np.nansum(tp[start_i:end_i]))
        self.sliding_max = max(agg_values)
        return self.sliding_max

Any suggestions or improvements in my code or is there a way i can implement it with groupby. I am a bit new to python environment, so please excuse if the question isn't put properly.

Stn;yyyymmddhh;rainfall 
xyz;1981010100;0.0
xyz;1981010101;0.0
xyz;1981010102;0.0
xyz;1981010103;0.0
xyz;1981010104;0.0
xyz;1981010105;0.0
xyz;1981010106;0.0
xyz;1981010107;0.0
xyz;1981010108;0.0
xyz;1981010109;0.4
xyz;1981010110;0.6
xyz;1981010111;0.1
xyz;1981010112;0.1
xyz;1981010113;0.0
xyz;1981010114;0.1
xyz;1981010115;0.6
xyz;1981010116;0.0
xyz;1981010117;0.0
xyz;1981010118;0.2
xyz;1981010119;0.0
xyz;1981010120;0.0
xyz;1981010121;0.0
xyz;1981010122;0.0
xyz;1981010123;0.0
xyz;1981010200;0.0

Upvotes: 0

Views: 445

Answers (1)

Alex G
Alex G

Reputation: 703

You first have to convert your column containing the datetimes to a Series of type datetime. You can do that parsing by providing the format of your datetimes.

df["yyyymmddhh"] = pd.to_datetime(df["yyyymmddhh"], format="%Y%M%d%H")

After having the correct data type you have to set that column as your index and can now use pandas functionality for time series data (resampling in your case).
First you resample the data to 3 hour windows and sum the values. From that you resample to yearly data and take the maximum value of all the 3 hour windows for each year.

df.set_index("yyyymmddhh").resample("3H").sum().resample("Y").max()

# Output
yyyymmddhh  rainfall 
1981-12-31  1.1

Upvotes: 1

Related Questions