Jeremy L.
Jeremy L.

Reputation: 138

Bin timestamp into custom periods

I'm fairly new to pandas and I am running into a roadblock. I have a data-frame that contains a time-stamp. I would like to add a column to my data-frame which would contain custom period names (strings). For example:

df = pd.DataFrame(pd.date_range('01-01 00:00', periods='72', freq='H'))

I would like to create a column df['Periods']which would contain custom periods name. For instance, Period1 if the time-stamp is between 01-01 00:00 and 01-02 00:00, Period2otherwise.

I was thinking about using cut but the bins attribute seems to only take integers.

What would you do?

Thank you.

Upvotes: 2

Views: 1426

Answers (1)

Primer
Primer

Reputation: 10302

In your df initialization periods must be a number not a string.

I guess approach on how to handle this will depend on how many periods to you want to have.

There are at least couple of ways:

Setup periods:

from datetime import time

morning_start = time(7)
morning_end = time(12)
evening_start = time(18)
evening_end = time(22)

periods = {'morning':[morning_start, morning_end], 'evening':[evening_start, evening_end]}

Approach 1.

def f(x, periods=periods):
    for k, v in periods.items():
        if x.hour >= v[0].hour and x.hour < v[1].hour:
            return k
    return 'unknown_period'

Approach 2.

for k, v in periods.items():
    df['periods'] = np.where(((v[0].hour <= df.t.apply(lambda x: x.hour)) & (df.t.apply(lambda x: x.hour) <= v[1].hour)), k, 'unknown_period')

With the two periods that are defined 1st approach works faster:

1000 loops, best of 3: 658 µs per loop

vs. 2nd:

100 loops, best of 3: 3.31 ms per loop

In both cases with only two periods you could make it one-line expression (without the need to loop through the periods):

df['periods'] = np.where((morning_start.hour <= df.t.apply(lambda x: x.hour)) & (df.t.apply(lambda x: x.hour) <= morning_end.hour), 'morning', 'evening')     

Upvotes: 2

Related Questions