rvictordelta
rvictordelta

Reputation: 668

In pandas, make a new column based on order and values

Setup:

dt0 = '2019-02-01'
d = {'hour': [20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
df = pd.DataFrame(data=d)

I have a dataframe with an ordered list of hours in it. I know the date that corresponds to the first entry. In this case dt0 = '2019-02-01'

I want to add a date column to my df based on dt0. The complication is that when the hour columns rolls over to 0, it's the next day. How can I create a column that accurately represents the date using pandas?

Upvotes: 1

Views: 229

Answers (2)

jpp
jpp

Reputation: 164663

You can use pd.to_timedelta to construct a timedelta series and sum with a pd.Timestamp object:

df['date'] = pd.Timestamp(dt0) + pd.to_timedelta(df['hour'].eq(0).cumsum(), unit='D')

print(df)

#     hour       date
# 0     20 2019-02-01
# 1     21 2019-02-01
# 2     22 2019-02-01
# 3     23 2019-02-01
# 4      0 2019-02-02
# 5      1 2019-02-02
# 6      2 2019-02-02
# 7      3 2019-02-02
# 8      4 2019-02-02
# 9      5 2019-02-02
# 10     6 2019-02-02
# 11     7 2019-02-02
# 12     8 2019-02-02
# 13     9 2019-02-02
# 14    10 2019-02-02

Upvotes: 0

Dani Mesejo
Dani Mesejo

Reputation: 61910

If I understood correctly, you could do:

import numpy as np
import pandas as pd

dt0 = '2019-02-01'
d = {'hour': [20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
df = pd.DataFrame(data=d)
result = pd.to_datetime(dt0) + pd.Series([pd.DateOffset(d) for d in df.hour.eq(0).cumsum()])

print(result)

Output

0    2019-02-01
1    2019-02-01
2    2019-02-01
3    2019-02-01
4    2019-02-02
5    2019-02-02
6    2019-02-02
7    2019-02-02
8    2019-02-02
9    2019-02-02
10   2019-02-02
11   2019-02-02
12   2019-02-02
13   2019-02-02
14   2019-02-02
dtype: datetime64[ns]

The key here is df.hour.eq(0).cumsum() it checks the values that are equals to 0 and then do a cumulative sum on them. For example above it is:

0     0
1     0
2     0
3     0
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    1
12    1
13    1
14    1
Name: hour, dtype: int64

Upvotes: 1

Related Questions