Maduro
Maduro

Reputation: 725

how to create a bar chart in python with multiple x-axis

I have a dataset with 3 columns: BOROUGHS, COMPLAINT_DATE, OFFENSE

NOTE: the date format is like this: 2010-01-30

I do know how to create a simple bar chart...like this:

df.plot(kind="bar")

But, I need something like this:

enter image description here

This chart is telling me the 5 boroughs, the number of complaints and the year. Plus using colors.

First, how do you do something like that? Second, does this type of chart has a name? like, multi-bar chart or something like that?

EDIT: enter image description here

the purple color should be first...in the bar... but it says that it has more crime...

EDIT: #2 Plus...look at this number base on 2010 and 2019 enter image description here

Edit:#3 too small... not showing the number of crime at the bottom Thanks, enter image description here

Upvotes: 1

Views: 3620

Answers (1)

Trenton McKinney
Trenton McKinney

Reputation: 62523

  • The data will need to be grouped and aggregated by count, and then pivoted into the correct shape.
    • Use the .dt accessor to extract the year from the 'complaint_date' column.
  • See pandas.DataFrame.plot & pandas.DataFrame.plot.bar for all the available parameters.
import pandas as pd
import matplotlib.pyplot as plt

# sample data
data = {'boroughs': ['x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z', 'x'],
        'complaint_date': ['2020-11-1', '2020-11-1', '2020-11-1', '2019-11-1', '2019-11-1', '2019-11-1', '2020-11-1', '2020-11-1', '2020-11-1', '2019-11-1', '2019-11-1', '2019-11-1', '2019-11-1'],
        'offense': ['a', 'b', 'c', 'a', 'b', 'c', 'd', 'e', 'f', 'd', 'e', 'f', 'd']}

# create dataframe
df = pd.DataFrame(data)

# convert date column to datetime dtype
df.complaint_date = pd.to_datetime(df.complaint_date)

# groupby year and borough to get count of offenses
dfg = df.groupby([df.complaint_date.dt.year, 'boroughs']).boroughs.count().reset_index(name='count')

# display(dfg)
   complaint_date boroughs  count
0            2019        x      3
1            2019        y      2
2            2019        z      2
3            2020        x      2
4            2020        y      2
5            2020        z      2

# pivot into the correct form for stacked bar
dfp = dfg.pivot(index='complaint_date', columns='boroughs', values='count')

# display(dfp)
boroughs        x  y  z
complaint_date         
2019            3  2  2
2020            2  2  2

# plot
dfp.plot.bar(stacked=True, xlabel='Year Complaint Filed', ylabel='Volumn of Complaints')
plt.legend(title='Boroughs', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=0)

enter image description here

Response to comment

  • In response to AttributeError: 'Rectangle' object has no property 'xlabel'
  • pandas probably needs to be updated; this was run in version 1.1.3.
# plot
dfp.plot.bar(stacked=True)
plt.legend(title='Boroughs', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xlabel('Year Complaint Filed')
plt.ylabel('Volumn of Complaints')
plt.xticks(rotation=0)

A better option than a stacked bar

  • Use seaborn.barplot
  • This will provide a better overall representation of the relative values for each bar.
import seaborn as sns

# use dfg from above

# plot
fig, ax = plt.subplots(figsize=(6, 4))
sns.barplot(y='complaint_date', x='count', data=dfg, hue='boroughs', orient='h', ax=ax)

# use log scale since you have large numbers
plt.xscale('log')

# relocate the legend
plt.legend(title='Boroughs', bbox_to_anchor=(1.05, 1), loc='upper left')

enter image description here

  • See question or question to change the format of the x-tick values from exponents to integers.

Upvotes: 1

Related Questions