Extract and sum totals for time period

Question

I have a dataset of rainfall, with rainfall values being associated with a time (minute) and date on which rainfall occurred, if no rainfall occurred, nothing is logged. As such some days with large storms may have 100 readings, while some days will have none or only one or two. Example below:

D   M   Y       DAY     Hr  Min   Tot
2   3   2002    2419    5   49    0.2
2   3   2002    2419    5   53    0.4
2   3   2002    2419    5   57    0.2
2   3   2002    2419    6   8     0.6
3   3   2002    2420    3   11    0.2
3   3   2002    2420    3   13    0.2
3   3   2002    2420    23  3     0.8
3   3   2002    2420    23  4     0.4
3   3   2002    2420    23  29    0.6
4   3   2002    2421    19  7     0.2
4   3   2002    2421    19  8     0.2
4   3   2002    2421    19  9     0.2
4   3   2002    2421    19  10    0.2
('Tot' is rainfall total)

This dataset is about ten years long, and I need to translate this data into 10 minute, 1 hour and 3 hour totals for correlation with another variable. For example, I need to be able to extract the total rainfall that occurred in the ten minutes 3/3/2002 23:00:00-23:10:00, which comes to 1.2, and then also for the hour from 3/3/2002 23:00:00 which comes to 1.8, and so on.

As I understand it, this is not simply a group by and sum; it would be if I were extracting daily totals (which I have done already), but this seems more complicated than that. Are there any ideas out there as to how I might go about completing this task?

Many thanks for any thoughts and suggestions!

Scott Boston · Accepted Answer

You can use the following to generate your three dataframes/series

Create a column with datetime dtype using `to_datetime`

df['date'] = pd.to_datetime(df['Y'].astype(str)+'-'+df['M'].astype(str)+'-'+df['D'].astype(str)+' '+df['Hr'].astype(str)+':'+df['Min'].astype(str)+':00')

`resample` date with the intervals 10T, 1H and 3H with `sum`

df_10T = df.resample('10T', on='date')['Tot'].sum()
df_1H = df.resample('1H', on='date')['Tot'].sum()
df_3H = df.resample('3H', on='date')['Tot'].sum()

Display non-zero values of each series for checking.

10 Minutes

df_10T[df_10T.ne(0)]
date
2002-03-02 05:40:00    0.2
2002-03-02 05:50:00    0.6
2002-03-02 06:00:00    0.6
2002-03-03 03:10:00    0.4
2002-03-03 23:00:00    1.2
2002-03-03 23:20:00    0.6
2002-03-04 19:00:00    0.6
2002-03-04 19:10:00    0.2
Name: Tot, dtype: float64

1 hour

df_1H[df_1H.ne(0)]
date
2002-03-02 05:00:00    0.8
2002-03-02 06:00:00    0.6
2002-03-03 03:00:00    0.4
2002-03-03 23:00:00    1.8
2002-03-04 19:00:00    0.8
Name: Tot, dtype: float64

3 hour

df_3H[df_3H.ne(0)]
date
2002-03-02 03:00:00    0.8
2002-03-02 06:00:00    0.6
2002-03-03 03:00:00    0.4
2002-03-03 21:00:00    1.8
2002-03-04 18:00:00    0.8
Name: Tot, dtype: float64

Extract and sum totals for time period

Answers (2)

Create a column with datetime dtype using `to_datetime`

`resample` date with the intervals 10T, 1H and 3H with `sum`

Display non-zero values of each series for checking.

10 Minutes

1 hour

3 hour

Related Questions

Extract and sum totals for time period

Answers (2)

Create a column with datetime dtype using to_datetime

resample date with the intervals 10T, 1H and 3H with sum

Display non-zero values of each series for checking.

10 Minutes

1 hour

3 hour

Related Questions

Create a column with datetime dtype using `to_datetime`

`resample` date with the intervals 10T, 1H and 3H with `sum`