Ailurophile
Ailurophile

Reputation: 3005

How to groupby pandas datetime with hourly and day?

I'm trying to get the count of occurrences for day and hourly from the pandas DateTime column.

data

import pandas as pd
timeData = [
    '2009/6/12 2:00', '2009/6/12 3:00', '2009/6/12 4:00', '2009/6/12 5:00', '2009/6/12 6:00', '2009/6/12 7:00', '2009/6/12 8:00', '2009/6/12 9:00', '2009/6/12 10:00', '2009/6/12 11:00', '2009/6/12 12:00', '2009/6/12 13:00', '2009/6/12 14:00', '2009/6/12 15:00', '2009/6/12 16:00', '2009/6/12 17:00', '2009/6/12 18:00', '2009/6/12 19:00', '2009/6/12 20:00', '2009/6/12 21:00', '2009/6/12 22:00', '2009/6/12 23:00',
    '2009/6/13 0:00', '2009/6/13 1:00', '2009/6/13 2:00', '2009/6/13 3:00', '2009/6/13 4:00', '2009/6/13 5:00', '2009/6/13 6:00', '2009/6/13 7:00', '2009/6/13 8:00', '2009/6/13 9:00', '2009/6/13 10:00', '2009/6/13 11:00', '2009/6/13 12:00', '2009/6/13 13:00', '2009/6/13 14:00', '2009/6/13 15:00', '2009/6/13 16:00', '2009/6/13 17:00', '2009/6/13 18:00', '2009/6/13 19:00', '2009/6/13 20:00', '2009/6/13 21:00', '2009/6/13 22:00', '2009/6/13 23:00',
    '2009/6/14 0:00', '2009/6/14 1:00', '2009/6/14 2:00', '2009/6/14 3:00', '2009/6/14 4:00', '2009/6/14 5:00', '2009/6/14 6:00', '2009/6/14 7:00', '2009/6/14 8:00', '2009/6/14 9:00', '2009/6/14 10:00', '2009/6/14 11:00', '2009/6/14 12:00', '2009/6/14 13:00', '2009/6/14 14:00', '2009/6/14 15:00', '2009/6/14 16:00', '2009/6/14 17:00', '2009/6/14 18:00', '2009/6/14 19:00', '2009/6/14 20:00', '2009/6/14 21:00', '2009/6/14 22:00', '2009/6/14 23:00',
    '2009/6/15 0:00', '2009/6/15 1:00', '2009/6/15 2:00', '2009/6/15 3:00', '2009/6/15 4:00', '2009/6/15 5:00', '2009/6/15 6:00', '2009/6/15 7:00', '2009/6/15 8:00', '2009/6/15 9:00', '2009/6/15 10:00', '2009/6/15 11:00', '2009/6/15 12:00', '2009/6/15 13:00', '2009/6/15 14:00', '2009/6/15 15:00', '2009/6/15 16:00', '2009/6/15 17:00', '2009/6/15 18:00', '2009/6/15 19:00', '2009/6/15 20:00', '2009/6/15 21:00', '2009/6/15 22:00', '2009/6/15 23:00',
    '2009/6/15 0:00', '2009/6/16 1:00', '2009/6/16 2:00', '2009/6/16 3:00', '2009/6/16 4:00', '2009/6/16 5:00', '2009/6/16 6:00', '2009/6/16 7:00', '2009/6/16 8:00', '2009/6/16 9:00', '2009/6/16 10:00', '2009/6/16 11:00', '2009/6/16 12:00', '2009/6/16 13:00', '2009/6/16 14:00', '2009/6/16 15:00', '2009/6/16 16:00', '2009/6/16 17:00', '2009/6/16 18:00', '2009/6/16 19:00', '2009/6/16 20:00', '2009/6/16 21:00', '2009/6/16 22:00', '2009/6/16 23:00',
    '2009/6/15 0:00', '2009/6/17 1:00', '2009/6/17 2:00', '2009/6/17 3:00', '2009/6/17 4:00', '2009/6/17 5:00', '2009/6/17 6:00', '2009/6/17 7:00', '2009/6/17 8:00', '2009/6/17 9:00', '2009/6/17 10:00', '2009/6/17 11:00', '2009/6/17 12:00', '2009/6/17 13:00', '2009/6/17 14:00', '2009/6/17 15:00', '2009/6/17 16:00', '2009/6/17 17:00', '2009/6/17 18:00', '2009/6/17 19:00', '2009/6/17 20:00', '2009/6/17 21:00', '2009/6/17 22:00', '2009/6/17 23:00',
    '2009/6/18 0:00', '2009/6/18 1:00', '2009/6/18 2:00', '2009/6/18 3:00', '2009/6/18 4:00', '2009/6/18 5:00', '2009/6/18 6:00', '2009/6/18 7:00', '2009/6/18 8:00', '2009/6/18 9:00', '2009/6/18 10:00', '2009/6/18 11:00', '2009/6/18 12:00', '2009/6/18 13:00', '2009/6/18 14:00', '2009/6/18 15:00', '2009/6/18 16:00', '2009/6/18 17:00', '2009/6/18 18:00', '2009/6/18 19:00', '2009/6/18 20:00', '2009/6/18 21:00', '2009/6/18 22:00', '2009/6/18 23:00',
    '2009/6/15 0:00', '2009/6/19 1:00', '2009/6/19 2:00', '2009/6/19 3:00', '2009/6/19 4:00', '2009/6/19 5:00', '2009/6/19 6:00', '2009/6/19 7:00', '2009/6/19 8:00', '2009/6/19 9:00', '2009/6/19 10:00', '2009/6/19 11:00', '2009/6/19 12:00', '2009/6/19 13:00', '2009/6/19 14:00', '2009/6/19 15:00', '2009/6/19 16:00', '2009/6/19 17:00', '2009/6/19 18:00', '2009/6/19 19:00', '2009/6/19 20:00', '2009/6/19 21:00', '2009/6/19 22:00', '2009/6/19 23:00',
    '2009/6/20 0:00', '2009/6/20 1:00', '2009/6/20 2:00', '2009/6/20 3:00', '2009/6/20 4:00', '2009/6/20 5:00', '2009/6/20 6:00', '2009/6/20 7:00', '2009/6/20 8:00', '2009/6/20 9:00', '2009/6/20 10:00', '2009/6/20 11:00', '2009/6/20 12:00', '2009/6/20 13:00', '2009/6/20 14:00', '2009/6/20 15:00', '2009/6/20 16:00', '2009/6/20 17:00', '2009/6/20 18:00', '2009/6/20 19:00', '2009/6/20 20:00', '2009/6/20 21:00', '2009/6/20 22:00', '2009/6/20 23:00',
    '2009/6/21 0:00', '2009/6/21 1:00', '2009/6/21 2:00', '2009/6/21 3:00', '2009/6/21 4:00', '2009/6/21 5:00', '2009/6/21 6:00', '2009/6/21 7:00', '2009/6/21 8:00', '2009/6/21 9:00', '2009/6/21 10:00', '2009/6/21 11:00', '2009/6/21 12:00', '2009/6/21 13:00', '2009/6/21 14:00', '2009/6/21 15:00', '2009/6/21 16:00', '2009/6/21 17:00', '2009/6/21 18:00', '2009/6/21 19:00', '2009/6/21 20:00', '2009/6/21 21:00', '2009/6/21 22:00', '2009/6/21 23:00',
    '2009/6/22 0:00', '2009/6/22 1:00', '2009/6/22 2:00', '2009/6/22 3:00', '2009/6/22 4:00', '2009/6/22 5:00', '2009/6/22 6:00', '2009/6/22 7:00', '2009/6/22 8:00', '2009/6/22 9:00', '2009/6/22 10:00', '2009/6/22 11:00', '2009/6/22 12:00', '2009/6/22 13:00', '2009/6/22 14:00', '2009/6/22 15:00', '2009/6/22 16:00', '2009/6/22 17:00', '2009/6/22 18:00', '2009/6/22 19:00', '2009/6/22 20:00', '2009/6/22 21:00', '2009/6/22 22:00', '2009/6/22 23:00',
    '2009/6/23 0:00', '2009/6/23 1:00', '2009/6/23 2:00', '2009/6/23 3:00', '2009/6/23 4:00']


df = pd.DataFrame({'Timestamp': timeData})

df["Timestamp"] = pd.to_datetime(df["Timestamp"], format="%Y/%m/%d %H:%M")

Required output

hours = ['12a', '1a', '2a', '3a', '4a', '5a', '6a',
        '7a', '8a', '9a','10a','11a',
        '12p', '1p', '2p', '3p', '4p', '5p',
        '6p', '7p', '8p', '9p', '10p', '11p'];
days = ['Saturday', 'Friday', 'Thursday',
        'Wednesday', 'Tuesday', 'Monday', 'Sunday']

output = pd.DataFrame(columns=[hours])

output["Day"] = days

Required Output with values

  Day 12a   1a  2a  3a  4a  5a  6a  7a  8a  9a  10a 11a 12p 1p  2p  3p  4p  5p  6p  7p  8p  9p  10p 11p 
Saturday
Friday
Thursday
Wednesday
Tuesday
Monday
Sunday

Is there a way to group the data with hourly for each day in python?

Upvotes: 1

Views: 413

Answers (2)

jezrael
jezrael

Reputation: 863541

Use Series.dt.strftime with %I%p for hours in 12h format, then need some processing for remove last m, lowercase and remove first 0, for days use Series.dt.day_name and pass to crosstab, last for original order is used DataFrame.reindex:

hours = df["Timestamp"].dt.strftime('%I%p').str[:-1].str.lower().str.lstrip('0')
days = df["Timestamp"].dt.day_name()

df = pd.crosstab(days, hours).reindex(index=days.unique(), columns=hours.unique())

Alternative with assign columns:

df['hours'] = df["Timestamp"].dt.strftime('%I%p').str[:-1].str.lower().str.lstrip('0')
df['days']  = df["Timestamp"].dt.day_name()

df = pd.crosstab(df['days'], df['hours']).reindex(index=df['days'].unique(), 
                                                  columns=df['hours'].unique())

print (df)
Timestamp  2a  3a  4a  5a  6a  7a  8a  9a  10a  11a  ...  4p  5p  6p  7p  8p  \
Timestamp                                            ...                       
Friday      2   2   2   2   2   2   2   2    2    2  ...   2   2   2   2   2   
Saturday    2   2   2   2   2   2   2   2    2    2  ...   2   2   2   2   2   
Sunday      2   2   2   2   2   2   2   2    2    2  ...   2   2   2   2   2   
Monday      2   2   2   2   2   2   2   2    2    2  ...   2   2   2   2   2   
Tuesday     2   2   2   1   1   1   1   1    1    1  ...   1   1   1   1   1   
Wednesday   1   1   1   1   1   1   1   1    1    1  ...   1   1   1   1   1   
Thursday    1   1   1   1   1   1   1   1    1    1  ...   1   1   1   1   1   

Timestamp  9p  10p  11p  12a  1a  
Timestamp                         
Friday      2    2    2    0   1  
Saturday    2    2    2    2   2  
Sunday      2    2    2    2   2  
Monday      2    2    2    5   2  
Tuesday     1    1    1    1   2  
Wednesday   1    1    1    0   1  
Thursday    1    1    1    1   1  

[7 rows x 24 columns]

If need custom order defined in lists use ordered Categoricals:

hourscat = ['12a', '1a', '2a', '3a', '4a', '5a', '6a',
        '7a', '8a', '9a','10a','11a',
        '12p', '1p', '2p', '3p', '4p', '5p',
        '6p', '7p', '8p', '9p', '10p', '11p'];
dayscat = ['Saturday', 'Friday', 'Thursday',
        'Wednesday', 'Tuesday', 'Monday', 'Sunday']

hours = pd.Categorical(df["Timestamp"].dt.strftime('%I%p').str[:-1].str.lower().str.lstrip('0'), ordered=True, categories=hourscat)
days = pd.Categorical(df["Timestamp"].dt.day_name(), ordered=True, categories=dayscat)

df = pd.crosstab(days, hours)

print (df)
col_0      12a  1a  2a  3a  4a  5a  6a  7a  8a  9a  ...  2p  3p  4p  5p  6p  \
row_0                                               ...                       
Saturday     2   2   2   2   2   2   2   2   2   2  ...   2   2   2   2   2   
Friday       0   1   2   2   2   2   2   2   2   2  ...   2   2   2   2   2   
Thursday     1   1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   
Wednesday    0   1   1   1   1   1   1   1   1   1  ...   1   1   1   1   1   
Tuesday      1   2   2   2   2   1   1   1   1   1  ...   1   1   1   1   1   
Monday       5   2   2   2   2   2   2   2   2   2  ...   2   2   2   2   2   
Sunday       2   2   2   2   2   2   2   2   2   2  ...   2   2   2   2   2   

col_0      7p  8p  9p  10p  11p  
row_0                            
Saturday    2   2   2    2    2  
Friday      2   2   2    2    2  
Thursday    1   1   1    1    1  
Wednesday   1   1   1    1    1  
Tuesday     1   1   1    1    1  
Monday      2   2   2    2    2  
Sunday      2   2   2    2    2  

[7 rows x 24 columns]
        

Upvotes: 2

mozway
mozway

Reputation: 262294

You can use:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['day'] = df['Timestamp'].dt.day_name()
df['hour'] = df['Timestamp'].dt.strftime('%I%p').str[:-1].str.lower().str.strip('0')
hours = df['hour'].unique()
df.groupby(['day', 'hour']).count().unstack().droplevel(0, axis=1).reindex(hours, axis=1)

output:

hour        2a   3a   4a   5a   6a   7a   8a   9a  10a  11a  12p   1p   2p   3p   4p   5p   6p   7p   8p   9p  10p  11p  12a   1a
day                                                                                                                              
Friday     2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  NaN  1.0
Monday     2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  5.0  2.0
Saturday   2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0
Sunday     2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0
Thursday   1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
Tuesday    2.0  2.0  2.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  2.0
Wednesday  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  NaN  1.0                                                                         

or, to fill NAs:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['day'] = df['Timestamp'].dt.day_name()
df['hour'] = df['Timestamp'].dt.strftime('%I%p').str[:-1].str.lower().str.strip('0')
hours = df['hour'].unique()
df.groupby(['day', 'hour']).count().unstack(fill_value=0).droplevel(0, axis=1).reindex(hours, axis=1)
hour       2a  3a  4a  5a  6a  7a  8a  9a  10a  11a  12p  1p  2p  3p  4p  5p  6p  7p  8p  9p  10p  11p  12a  1a
day                                                                                                            
Friday      2   2   2   2   2   2   2   2    2    2    2   2   2   2   2   2   2   2   2   2    2    2    0   1
Monday      2   2   2   2   2   2   2   2    2    2    2   2   2   2   2   2   2   2   2   2    2    2    5   2
Saturday    2   2   2   2   2   2   2   2    2    2    2   2   2   2   2   2   2   2   2   2    2    2    2   2
Sunday      2   2   2   2   2   2   2   2    2    2    2   2   2   2   2   2   2   2   2   2    2    2    2   2
Thursday    1   1   1   1   1   1   1   1    1    1    1   1   1   1   1   1   1   1   1   1    1    1    1   1
Tuesday     2   2   2   1   1   1   1   1    1    1    1   1   1   1   1   1   1   1   1   1    1    1    1   2
Wednesday   1   1   1   1   1   1   1   1    1    1    1   1   1   1   1   1   1   1   1   1    1    1    0   1

Upvotes: 2

Related Questions