Reputation: 3005
I'm trying to get the count of occurrences for day and hourly from the pandas DateTime column.
data
import pandas as pd
timeData = [
'2009/6/12 2:00', '2009/6/12 3:00', '2009/6/12 4:00', '2009/6/12 5:00', '2009/6/12 6:00', '2009/6/12 7:00', '2009/6/12 8:00', '2009/6/12 9:00', '2009/6/12 10:00', '2009/6/12 11:00', '2009/6/12 12:00', '2009/6/12 13:00', '2009/6/12 14:00', '2009/6/12 15:00', '2009/6/12 16:00', '2009/6/12 17:00', '2009/6/12 18:00', '2009/6/12 19:00', '2009/6/12 20:00', '2009/6/12 21:00', '2009/6/12 22:00', '2009/6/12 23:00',
'2009/6/13 0:00', '2009/6/13 1:00', '2009/6/13 2:00', '2009/6/13 3:00', '2009/6/13 4:00', '2009/6/13 5:00', '2009/6/13 6:00', '2009/6/13 7:00', '2009/6/13 8:00', '2009/6/13 9:00', '2009/6/13 10:00', '2009/6/13 11:00', '2009/6/13 12:00', '2009/6/13 13:00', '2009/6/13 14:00', '2009/6/13 15:00', '2009/6/13 16:00', '2009/6/13 17:00', '2009/6/13 18:00', '2009/6/13 19:00', '2009/6/13 20:00', '2009/6/13 21:00', '2009/6/13 22:00', '2009/6/13 23:00',
'2009/6/14 0:00', '2009/6/14 1:00', '2009/6/14 2:00', '2009/6/14 3:00', '2009/6/14 4:00', '2009/6/14 5:00', '2009/6/14 6:00', '2009/6/14 7:00', '2009/6/14 8:00', '2009/6/14 9:00', '2009/6/14 10:00', '2009/6/14 11:00', '2009/6/14 12:00', '2009/6/14 13:00', '2009/6/14 14:00', '2009/6/14 15:00', '2009/6/14 16:00', '2009/6/14 17:00', '2009/6/14 18:00', '2009/6/14 19:00', '2009/6/14 20:00', '2009/6/14 21:00', '2009/6/14 22:00', '2009/6/14 23:00',
'2009/6/15 0:00', '2009/6/15 1:00', '2009/6/15 2:00', '2009/6/15 3:00', '2009/6/15 4:00', '2009/6/15 5:00', '2009/6/15 6:00', '2009/6/15 7:00', '2009/6/15 8:00', '2009/6/15 9:00', '2009/6/15 10:00', '2009/6/15 11:00', '2009/6/15 12:00', '2009/6/15 13:00', '2009/6/15 14:00', '2009/6/15 15:00', '2009/6/15 16:00', '2009/6/15 17:00', '2009/6/15 18:00', '2009/6/15 19:00', '2009/6/15 20:00', '2009/6/15 21:00', '2009/6/15 22:00', '2009/6/15 23:00',
'2009/6/15 0:00', '2009/6/16 1:00', '2009/6/16 2:00', '2009/6/16 3:00', '2009/6/16 4:00', '2009/6/16 5:00', '2009/6/16 6:00', '2009/6/16 7:00', '2009/6/16 8:00', '2009/6/16 9:00', '2009/6/16 10:00', '2009/6/16 11:00', '2009/6/16 12:00', '2009/6/16 13:00', '2009/6/16 14:00', '2009/6/16 15:00', '2009/6/16 16:00', '2009/6/16 17:00', '2009/6/16 18:00', '2009/6/16 19:00', '2009/6/16 20:00', '2009/6/16 21:00', '2009/6/16 22:00', '2009/6/16 23:00',
'2009/6/15 0:00', '2009/6/17 1:00', '2009/6/17 2:00', '2009/6/17 3:00', '2009/6/17 4:00', '2009/6/17 5:00', '2009/6/17 6:00', '2009/6/17 7:00', '2009/6/17 8:00', '2009/6/17 9:00', '2009/6/17 10:00', '2009/6/17 11:00', '2009/6/17 12:00', '2009/6/17 13:00', '2009/6/17 14:00', '2009/6/17 15:00', '2009/6/17 16:00', '2009/6/17 17:00', '2009/6/17 18:00', '2009/6/17 19:00', '2009/6/17 20:00', '2009/6/17 21:00', '2009/6/17 22:00', '2009/6/17 23:00',
'2009/6/18 0:00', '2009/6/18 1:00', '2009/6/18 2:00', '2009/6/18 3:00', '2009/6/18 4:00', '2009/6/18 5:00', '2009/6/18 6:00', '2009/6/18 7:00', '2009/6/18 8:00', '2009/6/18 9:00', '2009/6/18 10:00', '2009/6/18 11:00', '2009/6/18 12:00', '2009/6/18 13:00', '2009/6/18 14:00', '2009/6/18 15:00', '2009/6/18 16:00', '2009/6/18 17:00', '2009/6/18 18:00', '2009/6/18 19:00', '2009/6/18 20:00', '2009/6/18 21:00', '2009/6/18 22:00', '2009/6/18 23:00',
'2009/6/15 0:00', '2009/6/19 1:00', '2009/6/19 2:00', '2009/6/19 3:00', '2009/6/19 4:00', '2009/6/19 5:00', '2009/6/19 6:00', '2009/6/19 7:00', '2009/6/19 8:00', '2009/6/19 9:00', '2009/6/19 10:00', '2009/6/19 11:00', '2009/6/19 12:00', '2009/6/19 13:00', '2009/6/19 14:00', '2009/6/19 15:00', '2009/6/19 16:00', '2009/6/19 17:00', '2009/6/19 18:00', '2009/6/19 19:00', '2009/6/19 20:00', '2009/6/19 21:00', '2009/6/19 22:00', '2009/6/19 23:00',
'2009/6/20 0:00', '2009/6/20 1:00', '2009/6/20 2:00', '2009/6/20 3:00', '2009/6/20 4:00', '2009/6/20 5:00', '2009/6/20 6:00', '2009/6/20 7:00', '2009/6/20 8:00', '2009/6/20 9:00', '2009/6/20 10:00', '2009/6/20 11:00', '2009/6/20 12:00', '2009/6/20 13:00', '2009/6/20 14:00', '2009/6/20 15:00', '2009/6/20 16:00', '2009/6/20 17:00', '2009/6/20 18:00', '2009/6/20 19:00', '2009/6/20 20:00', '2009/6/20 21:00', '2009/6/20 22:00', '2009/6/20 23:00',
'2009/6/21 0:00', '2009/6/21 1:00', '2009/6/21 2:00', '2009/6/21 3:00', '2009/6/21 4:00', '2009/6/21 5:00', '2009/6/21 6:00', '2009/6/21 7:00', '2009/6/21 8:00', '2009/6/21 9:00', '2009/6/21 10:00', '2009/6/21 11:00', '2009/6/21 12:00', '2009/6/21 13:00', '2009/6/21 14:00', '2009/6/21 15:00', '2009/6/21 16:00', '2009/6/21 17:00', '2009/6/21 18:00', '2009/6/21 19:00', '2009/6/21 20:00', '2009/6/21 21:00', '2009/6/21 22:00', '2009/6/21 23:00',
'2009/6/22 0:00', '2009/6/22 1:00', '2009/6/22 2:00', '2009/6/22 3:00', '2009/6/22 4:00', '2009/6/22 5:00', '2009/6/22 6:00', '2009/6/22 7:00', '2009/6/22 8:00', '2009/6/22 9:00', '2009/6/22 10:00', '2009/6/22 11:00', '2009/6/22 12:00', '2009/6/22 13:00', '2009/6/22 14:00', '2009/6/22 15:00', '2009/6/22 16:00', '2009/6/22 17:00', '2009/6/22 18:00', '2009/6/22 19:00', '2009/6/22 20:00', '2009/6/22 21:00', '2009/6/22 22:00', '2009/6/22 23:00',
'2009/6/23 0:00', '2009/6/23 1:00', '2009/6/23 2:00', '2009/6/23 3:00', '2009/6/23 4:00']
df = pd.DataFrame({'Timestamp': timeData})
df["Timestamp"] = pd.to_datetime(df["Timestamp"], format="%Y/%m/%d %H:%M")
Required output
hours = ['12a', '1a', '2a', '3a', '4a', '5a', '6a',
'7a', '8a', '9a','10a','11a',
'12p', '1p', '2p', '3p', '4p', '5p',
'6p', '7p', '8p', '9p', '10p', '11p'];
days = ['Saturday', 'Friday', 'Thursday',
'Wednesday', 'Tuesday', 'Monday', 'Sunday']
output = pd.DataFrame(columns=[hours])
output["Day"] = days
Required Output with values
Day 12a 1a 2a 3a 4a 5a 6a 7a 8a 9a 10a 11a 12p 1p 2p 3p 4p 5p 6p 7p 8p 9p 10p 11p
Saturday
Friday
Thursday
Wednesday
Tuesday
Monday
Sunday
Is there a way to group the data with hourly for each day in python?
Upvotes: 1
Views: 413
Reputation: 863541
Use Series.dt.strftime
with %I%p
for hours in 12h
format, then need some processing for remove last m
, lowercase and remove first 0
, for days use Series.dt.day_name
and pass to crosstab
, last for original order is used DataFrame.reindex
:
hours = df["Timestamp"].dt.strftime('%I%p').str[:-1].str.lower().str.lstrip('0')
days = df["Timestamp"].dt.day_name()
df = pd.crosstab(days, hours).reindex(index=days.unique(), columns=hours.unique())
Alternative with assign columns:
df['hours'] = df["Timestamp"].dt.strftime('%I%p').str[:-1].str.lower().str.lstrip('0')
df['days'] = df["Timestamp"].dt.day_name()
df = pd.crosstab(df['days'], df['hours']).reindex(index=df['days'].unique(),
columns=df['hours'].unique())
print (df)
Timestamp 2a 3a 4a 5a 6a 7a 8a 9a 10a 11a ... 4p 5p 6p 7p 8p \
Timestamp ...
Friday 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2
Saturday 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2
Sunday 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2
Monday 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2
Tuesday 2 2 2 1 1 1 1 1 1 1 ... 1 1 1 1 1
Wednesday 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1
Thursday 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1
Timestamp 9p 10p 11p 12a 1a
Timestamp
Friday 2 2 2 0 1
Saturday 2 2 2 2 2
Sunday 2 2 2 2 2
Monday 2 2 2 5 2
Tuesday 1 1 1 1 2
Wednesday 1 1 1 0 1
Thursday 1 1 1 1 1
[7 rows x 24 columns]
If need custom order defined in lists use ordered Categorical
s:
hourscat = ['12a', '1a', '2a', '3a', '4a', '5a', '6a',
'7a', '8a', '9a','10a','11a',
'12p', '1p', '2p', '3p', '4p', '5p',
'6p', '7p', '8p', '9p', '10p', '11p'];
dayscat = ['Saturday', 'Friday', 'Thursday',
'Wednesday', 'Tuesday', 'Monday', 'Sunday']
hours = pd.Categorical(df["Timestamp"].dt.strftime('%I%p').str[:-1].str.lower().str.lstrip('0'), ordered=True, categories=hourscat)
days = pd.Categorical(df["Timestamp"].dt.day_name(), ordered=True, categories=dayscat)
df = pd.crosstab(days, hours)
print (df)
col_0 12a 1a 2a 3a 4a 5a 6a 7a 8a 9a ... 2p 3p 4p 5p 6p \
row_0 ...
Saturday 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2
Friday 0 1 2 2 2 2 2 2 2 2 ... 2 2 2 2 2
Thursday 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1
Wednesday 0 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1
Tuesday 1 2 2 2 2 1 1 1 1 1 ... 1 1 1 1 1
Monday 5 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2
Sunday 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2
col_0 7p 8p 9p 10p 11p
row_0
Saturday 2 2 2 2 2
Friday 2 2 2 2 2
Thursday 1 1 1 1 1
Wednesday 1 1 1 1 1
Tuesday 1 1 1 1 1
Monday 2 2 2 2 2
Sunday 2 2 2 2 2
[7 rows x 24 columns]
Upvotes: 2
Reputation: 262294
You can use:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['day'] = df['Timestamp'].dt.day_name()
df['hour'] = df['Timestamp'].dt.strftime('%I%p').str[:-1].str.lower().str.strip('0')
hours = df['hour'].unique()
df.groupby(['day', 'hour']).count().unstack().droplevel(0, axis=1).reindex(hours, axis=1)
output:
hour 2a 3a 4a 5a 6a 7a 8a 9a 10a 11a 12p 1p 2p 3p 4p 5p 6p 7p 8p 9p 10p 11p 12a 1a
day
Friday 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 NaN 1.0
Monday 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 5.0 2.0
Saturday 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
Sunday 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
Thursday 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Tuesday 2.0 2.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0
Wednesday 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 NaN 1.0
or, to fill NAs:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['day'] = df['Timestamp'].dt.day_name()
df['hour'] = df['Timestamp'].dt.strftime('%I%p').str[:-1].str.lower().str.strip('0')
hours = df['hour'].unique()
df.groupby(['day', 'hour']).count().unstack(fill_value=0).droplevel(0, axis=1).reindex(hours, axis=1)
hour 2a 3a 4a 5a 6a 7a 8a 9a 10a 11a 12p 1p 2p 3p 4p 5p 6p 7p 8p 9p 10p 11p 12a 1a
day
Friday 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 1
Monday 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 2
Saturday 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Sunday 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Thursday 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Tuesday 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
Wednesday 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
Upvotes: 2