Reputation: 75
I have the following dataframe:
df.head()
Out[195]:
day hour power_kW
0 24 1 0.38180
1 24 2 0.17802
2 24 3 0.22356
3 24 4 0.17710
4 24 5 0.19389
in which the day column represents the number of the day: 24 corresponds to the 24th of January (the year is 2013), while the day number 151 corresponds to the 31st of May.
Each day in the dataframe has 24 hours, to which corresponds a measured value of power.
I want to obtain a datetime column which contains the year, day, month and hour. To do so I run this command:
df['date'] = pd.to_datetime(df['day']-1,errors='ignore', unit='d',origin='2013-01')
and the dataframe becomes like this:
df2.head()
Out[197]:
day hour power_kW date
0 24 1 0.38180 2013-01-24 00:00:00
1 24 2 0.17802 2013-01-24 00:00:00
2 24 3 0.22356 2013-01-24 00:00:00
3 24 4 0.17710 2013-01-24 00:00:00
4 24 5 0.19389 2013-01-24 00:00:00
I would like add the information "hour" to the column 'date'. I don't want to generate the date column by a time range with a 1h step, because there might be missing data.
Is it possible to manage to do that within the pd.to_datetime
instruction?
Thanks in advance.
Upvotes: 1
Views: 6029
Reputation: 164623
You can also construct a string and specify an explicit format:
s = '201301' + df['day'].astype(str).str.zfill(2) + df['hour'].astype(str).str.zfill(2)
df['date'] = pd.to_datetime(s, errors='ignore', format='%Y%m%d%H')
print(df)
day hour power_kW date
0 24 1 0.38180 2013-01-24 01:00:00
1 24 2 0.17802 2013-01-24 02:00:00
2 24 3 0.22356 2013-01-24 03:00:00
3 24 4 0.17710 2013-01-24 04:00:00
4 24 5 0.19389 2013-01-24 05:00:00
Upvotes: 2
Reputation: 51335
Starting from the date
column you already created, use a pandas timedelta
with unit='h'
:
df['date'] = df['date'] + pd.to_timedelta(df['hour'], unit='h')
>>> df
day hour power_kW date
0 24 1 0.38180 2013-01-24 01:00:00
1 24 2 0.17802 2013-01-24 02:00:00
2 24 3 0.22356 2013-01-24 03:00:00
3 24 4 0.17710 2013-01-24 04:00:00
4 24 5 0.19389 2013-01-24 05:00:00
You can also do this from scratch, starting from your original dataframe:
df['date'] = (pd.to_datetime(df['day']-1,errors='ignore',
unit='d',origin='2013-01') +
pd.to_timedelta(df['hour'], unit='h'))
Upvotes: 4