Reputation: 725
I have a dataset with 3 columns: BOROUGHS, COMPLAINT_DATE, OFFENSE
NOTE: the date format is like this: 2010-01-30
I do know how to create a simple bar chart...like this:
df.plot(kind="bar")
But, I need something like this:
This chart is telling me the 5 boroughs, the number of complaints and the year. Plus using colors.
First, how do you do something like that? Second, does this type of chart has a name? like, multi-bar chart or something like that?
the purple color should be first...in the bar... but it says that it has more crime...
EDIT: #2
Plus...look at this number base on 2010 and 2019
Edit:#3
too small...
not showing the number of crime at the bottom
Thanks,
Upvotes: 1
Views: 3620
Reputation: 62523
.dt
accessor to extract the year from the 'complaint_date'
column.pandas.DataFrame.plot
& pandas.DataFrame.plot.bar
for all the available parameters.import pandas as pd
import matplotlib.pyplot as plt
# sample data
data = {'boroughs': ['x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z', 'x'],
'complaint_date': ['2020-11-1', '2020-11-1', '2020-11-1', '2019-11-1', '2019-11-1', '2019-11-1', '2020-11-1', '2020-11-1', '2020-11-1', '2019-11-1', '2019-11-1', '2019-11-1', '2019-11-1'],
'offense': ['a', 'b', 'c', 'a', 'b', 'c', 'd', 'e', 'f', 'd', 'e', 'f', 'd']}
# create dataframe
df = pd.DataFrame(data)
# convert date column to datetime dtype
df.complaint_date = pd.to_datetime(df.complaint_date)
# groupby year and borough to get count of offenses
dfg = df.groupby([df.complaint_date.dt.year, 'boroughs']).boroughs.count().reset_index(name='count')
# display(dfg)
complaint_date boroughs count
0 2019 x 3
1 2019 y 2
2 2019 z 2
3 2020 x 2
4 2020 y 2
5 2020 z 2
# pivot into the correct form for stacked bar
dfp = dfg.pivot(index='complaint_date', columns='boroughs', values='count')
# display(dfp)
boroughs x y z
complaint_date
2019 3 2 2
2020 2 2 2
# plot
dfp.plot.bar(stacked=True, xlabel='Year Complaint Filed', ylabel='Volumn of Complaints')
plt.legend(title='Boroughs', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=0)
AttributeError: 'Rectangle' object has no property 'xlabel'
pandas
probably needs to be updated; this was run in version 1.1.3
.# plot
dfp.plot.bar(stacked=True)
plt.legend(title='Boroughs', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xlabel('Year Complaint Filed')
plt.ylabel('Volumn of Complaints')
plt.xticks(rotation=0)
seaborn.barplot
import seaborn as sns
# use dfg from above
# plot
fig, ax = plt.subplots(figsize=(6, 4))
sns.barplot(y='complaint_date', x='count', data=dfg, hue='boroughs', orient='h', ax=ax)
# use log scale since you have large numbers
plt.xscale('log')
# relocate the legend
plt.legend(title='Boroughs', bbox_to_anchor=(1.05, 1), loc='upper left')
Upvotes: 1