Strak
Strak

Reputation: 145

Resample time series data hourly with gaps

I have a year-long data set with dates and temperatures that's sample every 15min with occasional gaps in the data in Python. I've saved the dates and temperatures into a dataframe in pandas:

2015-08-21 09:35:47  27.928
2015-08-21 09:55:47  28.839
2015-08-21 10:15:47  28.495
...
2016-08-10 05:39:24  11.246
2016-08-10 05:59:24  11.912
2016-08-10 06:19:24  13.066

I would like to resample the data into every hour and tried with this line:

 newDat=data.resample(rule='H')

I also tried using the first and last 'how' but still didn't get what I was looking for. My results looked like:

2015-08-21 09:00:00  28.383500
2015-08-21 10:00:00  27.959667
2015-08-21 11:00:00  29.513333
...
2016-08-10 06:00:00  14.235667
2016-08-10 07:00:00  20.867333
2016-08-10 08:00:00  26.725000

I don't want the actual temperature or time values changed. I would like the command to grab the first times within every hour of the day so I get something like:

2015-08-21 09:15:47  27.925
2015-08-21 10:15:47  28.495
2015-08-21 11:15:47  27.596
...
2016-08-10 07:19:24  16.944
2016-08-10 08:19:24  27.149
2016-08-10 09:19:24  28.152

Upvotes: 1

Views: 1200

Answers (1)

unutbu
unutbu

Reputation: 879719

The DataFrame.groupby method can accept a sequence of values as group keys. The length of the sequence is expected to be the same as the number of rows in the DataFrame, and rows corresponding to equal-valued items in the sequence are grouped.

So you could use dates truncated to the nearest hour as group keys:

In [133]: groupkey = data['date'].values.astype('datetime64[h]'); groupkey
Out[133]: 
array(['2015-08-21T09', '2015-08-21T09', '2015-08-21T10', '2016-08-10T05',
       '2016-08-10T05', '2016-08-10T06'], dtype='datetime64[h]')

import pandas as pd
Timestamp = pd.Timestamp
data = pd.DataFrame({'date': [Timestamp('2015-08-21 09:35:47'), Timestamp('2015-08-21 09:55:47'), Timestamp('2015-08-21 10:15:47'), Timestamp('2016-08-10 05:39:24'), Timestamp('2016-08-10 05:59:24'), Timestamp('2016-08-10 06:19:24')], 'temp': [27.928000000000001, 28.839000000000002, 28.495000000000001, 11.245999999999999, 11.912000000000001, 13.065999999999999]}) 

groupkey = data['date'].values.astype('datetime64[h]')
result = data.groupby(groupkey).first().set_index('date')
print(result)

yields

                       temp
date                       
2015-08-21 09:35:47  27.928
2015-08-21 10:15:47  28.495
2016-08-10 05:39:24  11.246
2016-08-10 06:19:24  13.066

Upvotes: 4

Related Questions