Reputation: 2167
I am trying to add a column into a pandas dataframe
, that inserts Morning
, Evening
or Afternoon
, based on the time slots that I choose.
The code I am trying is as follows:
df_agg['timeOfDay'] = df_agg.apply(lambda _: '', axis=1)
for i in range (len(df_agg)):
if df_agg['time_stamp'].iloc[i][0].hour < 12:
df_agg['timeOfDay'].iloc[i] = 'Morning'
elif df_agg['time_stamp'].iloc[i][0].hour < 17 & df_agg['time_stamp'].iloc[i][0].hour > 12:
df_agg['timeOfDay'].iloc[i] = 'Afternoon'
else:
df_agg['timeOfDay'].iloc[i] = 'Evening'
When I go to return my df_agg
, it returns an empty timeOfDay
column. Does anyone know what I am doing wrong, when trying to insert these elements into the row, based on the time of day?
Upvotes: 2
Views: 2151
Reputation: 294228
pandas
use pd.cut
to break it by bins and give labels. This method makes it trivial to create more granular time slots as well
df_agg.assign(
timeOfDay=pd.cut(
df_agg.time_stamp.dt.hour,
[-1, 12, 17, 24],
labels=['Morning', 'Afternoon', 'Evening']))
numpy
using searchsorted
hours = df_agg.time_stamp.dt.hour.values
times = np.array(['Morning', 'Afternoon', 'Evening'])
df_agg.assign(timeOfDay=times[np.array([12, 17]).searchsorted(hours)])
both yield
time test
small data set
large data set
start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=10000, freq='1h')
df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(len(rng))})
setup
borrowed @jezrael's setup df_agg
start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=12, freq='1h')
df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(len(rng))})
print (df_agg)
Upvotes: 3
Reputation: 862541
I think you can use double numpy.where
, please check if is not necessary change <
to <=
or >
to >=
:
start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=12, freq='1h')
df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(12)})
print (df_agg)
a time_stamp
0 0 2015-02-24 10:00:00
1 1 2015-02-24 11:00:00
2 2 2015-02-24 12:00:00
3 3 2015-02-24 13:00:00
4 4 2015-02-24 14:00:00
5 5 2015-02-24 15:00:00
6 6 2015-02-24 16:00:00
7 7 2015-02-24 17:00:00
8 8 2015-02-24 18:00:00
9 9 2015-02-24 19:00:00
10 10 2015-02-24 20:00:00
11 11 2015-02-24 21:00:00
hours = df_agg.time_stamp.dt.hour.values
df_agg['timeOfDay'] = np.where(hours <= 12, 'Morning',
np.where(hours >= 17, 'Evening', 'Afternoon'))
a time_stamp timeOfDay
0 0 2015-02-24 10:00:00 Morning
1 1 2015-02-24 11:00:00 Morning
2 2 2015-02-24 12:00:00 Morning
3 3 2015-02-24 13:00:00 Afternoon
4 4 2015-02-24 14:00:00 Afternoon
5 5 2015-02-24 15:00:00 Afternoon
6 6 2015-02-24 16:00:00 Afternoon
7 7 2015-02-24 17:00:00 Evening
8 8 2015-02-24 18:00:00 Evening
9 9 2015-02-24 19:00:00 Evening
10 10 2015-02-24 20:00:00 Evening
11 11 2015-02-24 21:00:00 Evening
Upvotes: 1