In pandas, make a new column based on order and values

Question

Setup:

dt0 = '2019-02-01'
d = {'hour': [20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
df = pd.DataFrame(data=d)

I have a dataframe with an ordered list of hours in it. I know the date that corresponds to the first entry. In this case dt0 = '2019-02-01'

I want to add a date column to my df based on dt0. The complication is that when the hour columns rolls over to 0, it's the next day. How can I create a column that accurately represents the date using pandas?

Dani Mesejo · Accepted Answer

If I understood correctly, you could do:

import numpy as np
import pandas as pd

dt0 = '2019-02-01'
d = {'hour': [20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
df = pd.DataFrame(data=d)
result = pd.to_datetime(dt0) + pd.Series([pd.DateOffset(d) for d in df.hour.eq(0).cumsum()])

print(result)

Output

0    2019-02-01
1    2019-02-01
2    2019-02-01
3    2019-02-01
4    2019-02-02
5    2019-02-02
6    2019-02-02
7    2019-02-02
8    2019-02-02
9    2019-02-02
10   2019-02-02
11   2019-02-02
12   2019-02-02
13   2019-02-02
14   2019-02-02
dtype: datetime64[ns]

The key here is df.hour.eq(0).cumsum() it checks the values that are equals to 0 and then do a cumulative sum on them. For example above it is:

0     0
1     0
2     0
3     0
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    1
12    1
13    1
14    1
Name: hour, dtype: int64

In pandas, make a new column based on order and values

Answers (2)

Related Questions