GGiacomo
GGiacomo

Reputation: 75

pandas: convert the number of the day and the hour

I have the following dataframe:

df.head()
Out[195]: 
   day  hour  power_kW
0   24     1   0.38180
1   24     2   0.17802
2   24     3   0.22356
3   24     4   0.17710
4   24     5   0.19389

in which the day column represents the number of the day: 24 corresponds to the 24th of January (the year is 2013), while the day number 151 corresponds to the 31st of May.

Each day in the dataframe has 24 hours, to which corresponds a measured value of power.

I want to obtain a datetime column which contains the year, day, month and hour. To do so I run this command:

df['date'] = pd.to_datetime(df['day']-1,errors='ignore', unit='d',origin='2013-01')

and the dataframe becomes like this:

df2.head()
Out[197]: 
   day  hour  power_kW       date
0   24     1   0.38180 2013-01-24 00:00:00
1   24     2   0.17802 2013-01-24 00:00:00
2   24     3   0.22356 2013-01-24 00:00:00
3   24     4   0.17710 2013-01-24 00:00:00
4   24     5   0.19389 2013-01-24 00:00:00

I would like add the information "hour" to the column 'date'. I don't want to generate the date column by a time range with a 1h step, because there might be missing data.

Is it possible to manage to do that within the pd.to_datetime instruction?

Thanks in advance.

Upvotes: 1

Views: 6029

Answers (2)

jpp
jpp

Reputation: 164623

You can also construct a string and specify an explicit format:

s = '201301' + df['day'].astype(str).str.zfill(2) + df['hour'].astype(str).str.zfill(2)
df['date'] = pd.to_datetime(s, errors='ignore', format='%Y%m%d%H')

print(df)

   day  hour  power_kW                date
0   24     1   0.38180 2013-01-24 01:00:00
1   24     2   0.17802 2013-01-24 02:00:00
2   24     3   0.22356 2013-01-24 03:00:00
3   24     4   0.17710 2013-01-24 04:00:00
4   24     5   0.19389 2013-01-24 05:00:00

Upvotes: 2

sacuL
sacuL

Reputation: 51335

Starting from the date column you already created, use a pandas timedelta with unit='h':

df['date'] = df['date'] + pd.to_timedelta(df['hour'], unit='h')

>>> df
   day  hour  power_kW                date
0   24     1   0.38180 2013-01-24 01:00:00
1   24     2   0.17802 2013-01-24 02:00:00
2   24     3   0.22356 2013-01-24 03:00:00
3   24     4   0.17710 2013-01-24 04:00:00
4   24     5   0.19389 2013-01-24 05:00:00

You can also do this from scratch, starting from your original dataframe:

df['date'] = (pd.to_datetime(df['day']-1,errors='ignore',
                             unit='d',origin='2013-01') +
              pd.to_timedelta(df['hour'], unit='h'))

Upvotes: 4

Related Questions