Reputation: 151
I have a long time serie that starts in 1963 and ends in 2013. However, from 1963 til 2007 it has an hourly sampling period while after 2007's sampling rate changes to 5 minutes. Is it possible to resample data just after 2007 in a way that the entire time serie has hourly data sampling? Data slice below.
yr, m, d, h, m, s, sl
2007, 11, 30, 19, 0, 0, 2180
2007, 11, 30, 20, 0, 0, 2310
2007, 11, 30, 21, 0, 0, 2400
2007, 11, 30, 22, 0, 0, 2400
2007, 11, 30, 23, 0, 0, 2270
2008, 1, 1, 0, 0, 0, 2210
2008, 1, 1, 0, 5, 0, 2210
2008, 1, 1, 0, 10, 0, 2210
2008, 1, 1, 0, 15, 0, 2200
2008, 1, 1, 0, 20, 0, 2200
2008, 1, 1, 0, 25, 0, 2200
2008, 1, 1, 0, 30, 0, 2200
2008, 1, 1, 0, 35, 0, 2200
2008, 1, 1, 0, 40, 0, 2200
2008, 1, 1, 0, 45, 0, 2200
2008, 1, 1, 0, 50, 0, 2200
2008, 1, 1, 0, 55, 0, 2200
2008, 1, 1, 1, 0, 0, 2190
2008, 1, 1, 1, 5, 0, 2190
Thanks!
Upvotes: 1
Views: 1522
Reputation: 294488
Give your dataframe proper column names
df.columns = 'year month day hour minute second sl'.split()
Solution
df.groupby(['year', 'month', 'day', 'hour'], as_index=False).first()
year month day hour minute second sl
0 2007 11 30 19 0 0 2180
1 2007 11 30 20 0 0 2310
2 2007 11 30 21 0 0 2400
3 2007 11 30 22 0 0 2400
4 2007 11 30 23 0 0 2270
5 2008 1 1 0 0 0 2210
6 2008 1 1 1 0 0 2190
Option 2
Here is an option that builds off of the column renaming. We'll use pd.to_datetime
to cleverly get at our dates, then use resample
. However, you have time gaps and will have to address nulls and re-cast dtypes.
df.set_index(
pd.to_datetime(df.drop('sl', 1))
).resample('H').first().dropna().astype(df.dtypes)
year month day hour minute second sl
2007-11-30 19:00:00 2007 11 30 19 0 0 2180
2007-11-30 20:00:00 2007 11 30 20 0 0 2310
2007-11-30 21:00:00 2007 11 30 21 0 0 2400
2007-11-30 22:00:00 2007 11 30 22 0 0 2400
2007-11-30 23:00:00 2007 11 30 23 0 0 2270
2008-01-01 00:00:00 2008 1 1 0 0 0 2210
2008-01-01 01:00:00 2008 1 1 1 0 0 2190
Upvotes: 2
Reputation: 428
You'd better first append a datetime column to your dataframe:
df['datetime'] = pd.to_datetime(df[['yr', 'mnth', 'd', 'h', 'm', 's']])
But before that you should rename the month column:
df.rename(columns={ df.columns[1]: "mnth" })
Then you set a datetime column as dataframe index:
data.set_index('datetime', inplace=True)
Now you can apply resample method on your dataframe on by prefereed sampling rate:
df.resample('60T', on='datatime').mean()
Here I used mean to aggregate. You can use other method based on your need. See Pandas document as a ref.
Upvotes: 2
Reputation: 1474
Rename the minute column for convenience:
df.columns = ['yr', 'm', 'd', 'h', 'M', 's', 'sl']
Create a datetime column:
from datetime import datetime as dt
df['dt'] = df.apply(axis=1, func=lambda x: dt(x.yr, x.m, x.d, x.h, x.M, x.s))
Resample:
For pandas < 0.19:
df = df.set_index('dt').resample('60T').reset_index('dt')
For pandas >= 0.19:
df = df.resample('60T', on='dt')
Upvotes: 2