Saad Cherkaoui Ikbal
Saad Cherkaoui Ikbal

Reputation: 133

Draw multiple bar charts when dealing with categorical data

My dataframe has many columns containing categorical data. The categories in each of the columns are the same : ['A great deal' 'Not very much' 'None at all' 'Quite a lot' nan] .

I am trying to draw one bar chart that will include all the columns, but I am struggling since the data is categorical. I tried using a loop to plot the bars successively for each columns, and then just shift the bars for the next columns a bit to the side, but since the x labels are strings and not numerical, I don't see how I can do that.

Here is a sample of the data I'm using :

{'Confidence: The Press': {0: 'A great deal',
  1: 'Not very much',
  2: 'None at all',
  3: 'Not very much',
  4: 'Not very much'},
 'Confidence: Labor Unions': {0: 'A great deal',
  1: 'None at all',
  2: 'Not very much',
  3: 'Not very much',
  4: 'Quite a lot'},
 'Confidence: The Police': {0: 'A great deal',
  1: 'Not very much',
  2: 'Quite a lot',
  3: 'Not very much',
  4: 'Quite a lot'},
 'Confidence: Justice System/Courts': {0: 'A great deal',
  1: 'Not very much',
  2: 'Quite a lot',
  3: 'Not very much',
  4: 'Quite a lot'},
 'Confidence: The Government': {0: 'A great deal',
  1: 'None at all',
  2: 'Not very much',
  3: 'Not very much',
  4: 'Quite a lot'}}

Upvotes: 2

Views: 1256

Answers (1)

Patrick FitzGerald
Patrick FitzGerald

Reputation: 3670

Here is one way to plot the bar chart using pandas. I assume that you want to plot the counts of the strings for each column of your dataframe, in which case you first need to compute the counts. This can be done by first unpivoting the dataframe with .melt and then computing a cross-tabulation with .crosstab, assuming that each column contains the same categories. The following example uses the sample data that you have shared and plots the counts with a horizontal bar chart to make the labels readable without any additional formatting:

import pandas as pd  # v 1.1.3

df = pd.DataFrame(data)
dfmelted = df.melt()

dfmelted.head()
#                    variable          value
#  0    Confidence: The Press   A great deal
#  1    Confidence: The Press  Not very much
#  2    Confidence: The Press    None at all
#  3    Confidence: The Press  Not very much
#  4    Confidence: The Press  Not very much

ctab = pd.crosstab(index=dfmelted['variable'], columns=dfmelted['value'])

ctab
#  value                             A great deal  None at all  Not very much  Quite a lot
#
#                           variable
#  Confidence: Justice System/Courts            1            0              2            2
#           Confidence: Labor Unions            1            1              2            1
#         Confidence: The Government            1            1              2            1
#             Confidence: The Police            1            0              2            2
#              Confidence: The Press            1            1              3            0

ctab.plot.barh(figsize=(6,8), xlabel='count');

barh_counts

Note that nan values are ignored. If you want to include them in the plot, you need to convert them to strings.

Upvotes: 4

Related Questions