Kalamazoo
Kalamazoo

Reputation: 141

Pandas pivot table and groupby month and hour

I have data that is in this format: Screenshot of dataframe

I have to create barplots of the counts of id for each area (North,South,Middle) per hour for every month. E.g I have to plot 12 seperate barplots of counts of id hourly for 'North' for the months of Jan - Dec.

import numpy as np
from datetime import time,datetime,timedelta
df.Datetime = pd.to_datetime(Datetime) 
df.groupby('AREA')
North = df.get_group('North')
Middle = df.get_group('Middle')
South = df.get_group('South') 
North = pd.pivot_table(North,index='Datetime', values='id', aggfunc=[np.sum])

I am stuck and not sure how to continue. I am thinking of doing something like 'North.groupby(pd.Grouper(key='Datetime', freq='H'))' but it does not work since I have already indexed datetime. How can I group by months and hours from the dataframe 'North'?

Appreciate if anyone can help, thank you!

Upvotes: 1

Views: 3010

Answers (2)

Rauf Bhat
Rauf Bhat

Reputation: 61

For Distinct Count of ids:

Data['Month'] = Data['Datetime'].apply(lambda x:x.month)

Data['Hour'] = Data['Datetime'].apply(lambda x:x.hour)

Data = pd.DataFrame({'count' : Data.groupby(["Month","Hour","Area"])["id"].nunique()}).reset_index()

Upvotes: 0

Oscar Mike
Oscar Mike

Reputation: 744

df = pandas.DataFrame([
['2017-01-10 08:40:00', 1, 'North'],
['2017-01-10 08:30:00', 1, 'North'],
['2017-01-10 08:40:00', 1, 'North'],
['2017-01-10 15:40:00', 2, 'North'],
['2017-01-10 07:30:00', 2, 'North'],
['2017-01-10 08:40:00', 3, 'North'],
['2017-01-10 08:40:00', 1, 'Middle'],
['2017-01-10 08:30:00', 1, 'Middle'],
['2017-01-10 08:40:00', 1, 'Middle'],
['2017-01-11 16:40:00', 2, 'South'],
['2017-01-11 08:30:00', 2, 'South'],
['2017-01-11 07:40:00', 3, 'South'],
['2017-01-10 08:40:00', 2, 'South'],
['2017-01-10 08:40:00', 2, 'South'],
['2017-01-10 08:40:00', 3, 'South'],
['2017-01-10 08:40:00', 1, 'South']], columns=['Datetime', 'id', 'Area'])

You need to parse your datetimes and then you create two additional column containing the time and date. It depends on how much data analysis you will do after, if you will analyze it by a given time during different days, this can be a good approach:

df['Datetime'] = [datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S') for x in df['Datetime'].tolist()]
df['Date'] = [x.strftime('%Y-%m-%d') for x in df['Datetime'].tolist()]
df['Time'] = ['%s:00' % x.strftime('%H') for x in df['Datetime'].tolist()]

Then you just groupby and unstack your dataframe

df_1 = df.groupby(['Date', 'Time', 'id', 'Area']).count().unstack(['Area', 'id'])

and finally you plot your data

df_1.plot(kind='bar')

enter image description here

Upvotes: 1

Related Questions