reinhardt
reinhardt

Reputation: 2253

How to extract hourly data from a df in python?

I have the following df

     dates         Final
2020-01-01 00:15:00 94.7
2020-01-01 00:30:00 94.1
2020-01-01 00:45:00 94.1
2020-01-01 01:00:00 95.0
2020-01-01 01:15:00 96.6
2020-01-01 01:30:00 98.4
2020-01-01 01:45:00 99.8
2020-01-01 02:00:00 99.8
2020-01-01 02:15:00 98.0
2020-01-01 02:30:00 95.1
2020-01-01 02:45:00 91.9
2020-01-01 03:00:00 89.5

The entire dataset is till 2021-01-01 00:00:00 95.6 with a gap of 15mins.

Since the freq is 15mins, I would like to change it to 1 hour and maybe drop the middle values

Expected output

      dates        Final
2020-01-01 01:00:00 95.0
2020-01-01 02:00:00 99.8
2020-01-01 03:00:00 89.5

With the last row being 2021-01-01 00:00:00 95.6

How can this be done?

Thanks

Upvotes: 3

Views: 658

Answers (2)

Souha Gaaloul
Souha Gaaloul

Reputation: 328

If you're doing data analysis or data science I don't think dropping the middle values is a good approach at all! You should sum them I guess (I don't know about your use case but I know some stuff about Time Series data).

Upvotes: 0

ansev
ansev

Reputation: 30920

Use Series.dt.minute to performance a boolean indexing:

df_filtered = df.loc[df['dates'].dt.minute.eq(0)]
#if necessary
#df_filtered = df.loc[pd.to_datetime(df['dates']).dt.minute.eq(0)]
print(df_filtered)
                 dates  Final
3  2020-01-01 01:00:00   95.0
7  2020-01-01 02:00:00   99.8
11 2020-01-01 03:00:00   89.5

Upvotes: 3

Related Questions