Reputation: 529
I have a dataframe,df after converting DateTimeC column to a datetime type such as:
Index DateTimeC eventName
0 2017-08-20 01:11:24.210000 ABC
1 2017-08-20 01:11:30.224000 CDE
2 2017-08-20 02:16:30.210000 CDE
3 2017-08-20 02:27:30.211000 CDE
2 2017-09-10 01:30:40.212000 DEF
3 2017-09-11 01:35:23.122000 CDE
4 2017-09-11 02:22:22.145000 CDE
5 2017-09-16 02:26:11.222000 DEF
I intend to groupby month and hour and count the count of events in eventName within the grouped object.So applying this code:
df2=df.groupby([df['DateTimeC'].dt.month,df['DateTimeC'].dt.hour])['EventName'].count()
I get:
Index EventName
8,1 2
8,2 2
9,1 2
9,2 2
However, i want to use the pivot_table () on the resulting series as month as the index and hour as the column while the value parameter should be the frequency. So the resulting dataframe should be:
Index 0 1 2 3 4 5 6 7 8... 24
8 0 2 2 0 0 0 0 0 0... 0
9 0 2 2 0 0 0 0 0 0... 0
So what will be the corresponding argument for the parameters of the pivot_table () since the date and time are in the same column: DateTimeC
I tried to add the rename_index to rename the column for the frequency/count result so I can pass the new name to the 'value' parameter in the pivot_table () using this code:
df2=df.groupby([df['DateTimeC'].dt.month,df['DateTimeC'].dt.hour])['EventName'].count().reset_index(name='frequency')
but i get this error:
ValueError: cannot insert DateTimeC, already exists
Also, get the mean per hour for each month and transform the number associated with a specific month to its word equivalent
Index averagePerHour
August 0.17
September 0.17
Upvotes: 1
Views: 977
Reputation: 862841
I think need unstack
for reshape and reindex
if necessary add missing hours:
df2=(df.groupby([df['DateTimeC'].dt.month.rename('month'),
df['DateTimeC'].dt.hour.rename('hour')])
.size()
.unstack(fill_value=0)
.reindex(columns=np.arange(24), fill_value=0))
print (df2)
hour 0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 \
month ...
8 0 2 2 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
9 0 2 2 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
hour 21 22 23
month
8 0 0 0
9 0 0 0
[2 rows x 24 columns]
And then create dictionary for rename
with mean
:
L = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
d = dict(enumerate(L, 1))
df3 = df2.mean(axis=1).rename(d).to_frame('averagePerHour')
print (df3)
averagePerHour
month
Aug 0.166667
Sep 0.166667
If omit reindex
and missing some hours then mean
is different:
df2=(df.groupby([df['DateTimeC'].dt.month.rename('month'),
df['DateTimeC'].dt.hour.rename('hour')])
.size()
.unstack(fill_value=0)
)
print (df2)
hour 1 2
month
8 2 2
9 2 2
L = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
d = dict(enumerate(L, 1))
df3 = df2.mean(axis=1).rename(d).to_frame('averagePerHour')
print (df3)
averagePerHour
month
Aug 2.0
Sep 2.0
EDIT: If want convert months number to stings use dt.strftime
, check also http://strftime.org/
:
df2=(df.groupby([df['DateTimeC'].dt.strftime('%B').rename('month'),
df['DateTimeC'].dt.hour.rename('hour')])
.size()
.unstack(fill_value=0)
)
print (df2)
hour 1 2
month
August 2 2
September 2 2
df3 = df2.mean(axis=1).to_frame('averagePerHour')
print (df3)
averagePerHour
month
August 2.0
September 2.0
Upvotes: 1