Reputation: 138
I'm fairly new to pandas and I am running into a roadblock. I have a data-frame that contains a time-stamp. I would like to add a column to my data-frame which would contain custom period names (strings). For example:
df = pd.DataFrame(pd.date_range('01-01 00:00', periods='72', freq='H'))
I would like to create a column df['Periods']
which would contain custom periods name. For instance, Period1
if the time-stamp is between 01-01 00:00
and 01-02 00:00
, Period2
otherwise.
I was thinking about using cut
but the bins attribute seems to only take integers.
What would you do?
Thank you.
Upvotes: 2
Views: 1426
Reputation: 10302
In your df
initialization periods
must be a number not a string.
I guess approach on how to handle this will depend on how many periods to you want to have.
There are at least couple of ways:
Setup periods:
from datetime import time
morning_start = time(7)
morning_end = time(12)
evening_start = time(18)
evening_end = time(22)
periods = {'morning':[morning_start, morning_end], 'evening':[evening_start, evening_end]}
Approach 1.
def f(x, periods=periods):
for k, v in periods.items():
if x.hour >= v[0].hour and x.hour < v[1].hour:
return k
return 'unknown_period'
Approach 2.
for k, v in periods.items():
df['periods'] = np.where(((v[0].hour <= df.t.apply(lambda x: x.hour)) & (df.t.apply(lambda x: x.hour) <= v[1].hour)), k, 'unknown_period')
With the two periods that are defined 1st approach works faster:
1000 loops, best of 3: 658 µs per loop
vs. 2nd:
100 loops, best of 3: 3.31 ms per loop
In both cases with only two periods you could make it one-line expression (without the need to loop through the periods
):
df['periods'] = np.where((morning_start.hour <= df.t.apply(lambda x: x.hour)) & (df.t.apply(lambda x: x.hour) <= morning_end.hour), 'morning', 'evening')
Upvotes: 2