Gary
Gary

Reputation: 2167

How to add a column to pandas dataframe based on time from another column

I am trying to add a column into a pandas dataframe, that inserts Morning, Evening or Afternoon, based on the time slots that I choose.

The code I am trying is as follows:

df_agg['timeOfDay'] = df_agg.apply(lambda _: '', axis=1)
for i in range (len(df_agg)):
        if df_agg['time_stamp'].iloc[i][0].hour < 12:
            df_agg['timeOfDay'].iloc[i] = 'Morning'
        elif df_agg['time_stamp'].iloc[i][0].hour < 17 & df_agg['time_stamp'].iloc[i][0].hour > 12:
            df_agg['timeOfDay'].iloc[i] = 'Afternoon'
        else:
             df_agg['timeOfDay'].iloc[i] = 'Evening'

When I go to return my df_agg, it returns an empty timeOfDay column. Does anyone know what I am doing wrong, when trying to insert these elements into the row, based on the time of day?

Upvotes: 2

Views: 2151

Answers (2)

piRSquared
piRSquared

Reputation: 294228

pandas
use pd.cut to break it by bins and give labels. This method makes it trivial to create more granular time slots as well

df_agg.assign(
    timeOfDay=pd.cut(
        df_agg.time_stamp.dt.hour,
        [-1, 12, 17, 24],
        labels=['Morning', 'Afternoon', 'Evening']))

numpy
using searchsorted

hours = df_agg.time_stamp.dt.hour.values
times = np.array(['Morning', 'Afternoon', 'Evening'])

df_agg.assign(timeOfDay=times[np.array([12, 17]).searchsorted(hours)])

both yield

enter image description here


time test
small data set

enter image description here

large data set

start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=10000, freq='1h')

df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(len(rng))})  

enter image description here


setup
borrowed @jezrael's setup df_agg

start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=12, freq='1h')

df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(len(rng))})  
print (df_agg)

Upvotes: 3

jezrael
jezrael

Reputation: 862541

I think you can use double numpy.where, please check if is not necessary change < to <= or > to >=:

start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=12, freq='1h')

df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(12)})  
print (df_agg)
     a          time_stamp
0    0 2015-02-24 10:00:00
1    1 2015-02-24 11:00:00
2    2 2015-02-24 12:00:00
3    3 2015-02-24 13:00:00
4    4 2015-02-24 14:00:00
5    5 2015-02-24 15:00:00
6    6 2015-02-24 16:00:00
7    7 2015-02-24 17:00:00
8    8 2015-02-24 18:00:00
9    9 2015-02-24 19:00:00
10  10 2015-02-24 20:00:00
11  11 2015-02-24 21:00:00
hours = df_agg.time_stamp.dt.hour.values
df_agg['timeOfDay'] = np.where(hours <= 12, 'Morning', 
                      np.where(hours >= 17, 'Evening', 'Afternoon'))

     a          time_stamp  timeOfDay
0    0 2015-02-24 10:00:00    Morning
1    1 2015-02-24 11:00:00    Morning
2    2 2015-02-24 12:00:00    Morning
3    3 2015-02-24 13:00:00  Afternoon
4    4 2015-02-24 14:00:00  Afternoon
5    5 2015-02-24 15:00:00  Afternoon
6    6 2015-02-24 16:00:00  Afternoon
7    7 2015-02-24 17:00:00    Evening
8    8 2015-02-24 18:00:00    Evening
9    9 2015-02-24 19:00:00    Evening
10  10 2015-02-24 20:00:00    Evening
11  11 2015-02-24 21:00:00    Evening

Upvotes: 1

Related Questions