Reputation: 145
I have a year-long data set with dates and temperatures that's sample every 15min with occasional gaps in the data in Python. I've saved the dates and temperatures into a dataframe in pandas:
2015-08-21 09:35:47 27.928
2015-08-21 09:55:47 28.839
2015-08-21 10:15:47 28.495
...
2016-08-10 05:39:24 11.246
2016-08-10 05:59:24 11.912
2016-08-10 06:19:24 13.066
I would like to resample the data into every hour and tried with this line:
newDat=data.resample(rule='H')
I also tried using the first and last 'how' but still didn't get what I was looking for. My results looked like:
2015-08-21 09:00:00 28.383500
2015-08-21 10:00:00 27.959667
2015-08-21 11:00:00 29.513333
...
2016-08-10 06:00:00 14.235667
2016-08-10 07:00:00 20.867333
2016-08-10 08:00:00 26.725000
I don't want the actual temperature or time values changed. I would like the command to grab the first times within every hour of the day so I get something like:
2015-08-21 09:15:47 27.925
2015-08-21 10:15:47 28.495
2015-08-21 11:15:47 27.596
...
2016-08-10 07:19:24 16.944
2016-08-10 08:19:24 27.149
2016-08-10 09:19:24 28.152
Upvotes: 1
Views: 1200
Reputation: 879719
The DataFrame.groupby
method can accept a sequence of values as group keys.
The length of the sequence is expected to be the same as the number of rows in the DataFrame, and rows corresponding to equal-valued items in the sequence are grouped.
So you could use dates truncated to the nearest hour as group keys:
In [133]: groupkey = data['date'].values.astype('datetime64[h]'); groupkey
Out[133]:
array(['2015-08-21T09', '2015-08-21T09', '2015-08-21T10', '2016-08-10T05',
'2016-08-10T05', '2016-08-10T06'], dtype='datetime64[h]')
import pandas as pd
Timestamp = pd.Timestamp
data = pd.DataFrame({'date': [Timestamp('2015-08-21 09:35:47'), Timestamp('2015-08-21 09:55:47'), Timestamp('2015-08-21 10:15:47'), Timestamp('2016-08-10 05:39:24'), Timestamp('2016-08-10 05:59:24'), Timestamp('2016-08-10 06:19:24')], 'temp': [27.928000000000001, 28.839000000000002, 28.495000000000001, 11.245999999999999, 11.912000000000001, 13.065999999999999]})
groupkey = data['date'].values.astype('datetime64[h]')
result = data.groupby(groupkey).first().set_index('date')
print(result)
yields
temp
date
2015-08-21 09:35:47 27.928
2015-08-21 10:15:47 28.495
2016-08-10 05:39:24 11.246
2016-08-10 06:19:24 13.066
Upvotes: 4