Reputation: 668
Setup:
dt0 = '2019-02-01'
d = {'hour': [20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
df = pd.DataFrame(data=d)
I have a dataframe with an ordered list of hours in it. I know the date that corresponds to the first entry. In this case dt0 = '2019-02-01'
I want to add a date column to my df based on dt0
. The complication is that when the hour
columns rolls over to 0
, it's the next day. How can I create a column that accurately represents the date using pandas?
Upvotes: 1
Views: 229
Reputation: 164663
You can use pd.to_timedelta
to construct a timedelta
series and sum with a pd.Timestamp
object:
df['date'] = pd.Timestamp(dt0) + pd.to_timedelta(df['hour'].eq(0).cumsum(), unit='D')
print(df)
# hour date
# 0 20 2019-02-01
# 1 21 2019-02-01
# 2 22 2019-02-01
# 3 23 2019-02-01
# 4 0 2019-02-02
# 5 1 2019-02-02
# 6 2 2019-02-02
# 7 3 2019-02-02
# 8 4 2019-02-02
# 9 5 2019-02-02
# 10 6 2019-02-02
# 11 7 2019-02-02
# 12 8 2019-02-02
# 13 9 2019-02-02
# 14 10 2019-02-02
Upvotes: 0
Reputation: 61910
If I understood correctly, you could do:
import numpy as np
import pandas as pd
dt0 = '2019-02-01'
d = {'hour': [20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
df = pd.DataFrame(data=d)
result = pd.to_datetime(dt0) + pd.Series([pd.DateOffset(d) for d in df.hour.eq(0).cumsum()])
print(result)
Output
0 2019-02-01
1 2019-02-01
2 2019-02-01
3 2019-02-01
4 2019-02-02
5 2019-02-02
6 2019-02-02
7 2019-02-02
8 2019-02-02
9 2019-02-02
10 2019-02-02
11 2019-02-02
12 2019-02-02
13 2019-02-02
14 2019-02-02
dtype: datetime64[ns]
The key here is df.hour.eq(0).cumsum()
it checks the values that are equals to 0 and then do a cumulative sum on them. For example above it is:
0 0
1 0
2 0
3 0
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 1
13 1
14 1
Name: hour, dtype: int64
Upvotes: 1