Reputation: 141
I have data that is in this format: Screenshot of dataframe
I have to create barplots of the counts of id for each area (North,South,Middle) per hour for every month. E.g I have to plot 12 seperate barplots of counts of id hourly for 'North' for the months of Jan - Dec.
import numpy as np
from datetime import time,datetime,timedelta
df.Datetime = pd.to_datetime(Datetime)
df.groupby('AREA')
North = df.get_group('North')
Middle = df.get_group('Middle')
South = df.get_group('South')
North = pd.pivot_table(North,index='Datetime', values='id', aggfunc=[np.sum])
I am stuck and not sure how to continue. I am thinking of doing something like 'North.groupby(pd.Grouper(key='Datetime', freq='H'))' but it does not work since I have already indexed datetime. How can I group by months and hours from the dataframe 'North'?
Appreciate if anyone can help, thank you!
Upvotes: 1
Views: 3010
Reputation: 61
For Distinct Count of ids:
Data['Month'] = Data['Datetime'].apply(lambda x:x.month)
Data['Hour'] = Data['Datetime'].apply(lambda x:x.hour)
Data = pd.DataFrame({'count' : Data.groupby(["Month","Hour","Area"])["id"].nunique()}).reset_index()
Upvotes: 0
Reputation: 744
df = pandas.DataFrame([
['2017-01-10 08:40:00', 1, 'North'],
['2017-01-10 08:30:00', 1, 'North'],
['2017-01-10 08:40:00', 1, 'North'],
['2017-01-10 15:40:00', 2, 'North'],
['2017-01-10 07:30:00', 2, 'North'],
['2017-01-10 08:40:00', 3, 'North'],
['2017-01-10 08:40:00', 1, 'Middle'],
['2017-01-10 08:30:00', 1, 'Middle'],
['2017-01-10 08:40:00', 1, 'Middle'],
['2017-01-11 16:40:00', 2, 'South'],
['2017-01-11 08:30:00', 2, 'South'],
['2017-01-11 07:40:00', 3, 'South'],
['2017-01-10 08:40:00', 2, 'South'],
['2017-01-10 08:40:00', 2, 'South'],
['2017-01-10 08:40:00', 3, 'South'],
['2017-01-10 08:40:00', 1, 'South']], columns=['Datetime', 'id', 'Area'])
You need to parse your datetimes and then you create two additional column containing the time and date. It depends on how much data analysis you will do after, if you will analyze it by a given time during different days, this can be a good approach:
df['Datetime'] = [datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S') for x in df['Datetime'].tolist()]
df['Date'] = [x.strftime('%Y-%m-%d') for x in df['Datetime'].tolist()]
df['Time'] = ['%s:00' % x.strftime('%H') for x in df['Datetime'].tolist()]
Then you just groupby and unstack your dataframe
df_1 = df.groupby(['Date', 'Time', 'id', 'Area']).count().unstack(['Area', 'id'])
and finally you plot your data
df_1.plot(kind='bar')
Upvotes: 1