Reputation: 133
My dataframe has many columns containing categorical data. The categories in each of the columns are the same : ['A great deal' 'Not very much' 'None at all' 'Quite a lot' nan]
.
I am trying to draw one bar chart that will include all the columns, but I am struggling since the data is categorical. I tried using a loop to plot the bars successively for each columns, and then just shift the bars for the next columns a bit to the side, but since the x labels are strings and not numerical, I don't see how I can do that.
Here is a sample of the data I'm using :
{'Confidence: The Press': {0: 'A great deal',
1: 'Not very much',
2: 'None at all',
3: 'Not very much',
4: 'Not very much'},
'Confidence: Labor Unions': {0: 'A great deal',
1: 'None at all',
2: 'Not very much',
3: 'Not very much',
4: 'Quite a lot'},
'Confidence: The Police': {0: 'A great deal',
1: 'Not very much',
2: 'Quite a lot',
3: 'Not very much',
4: 'Quite a lot'},
'Confidence: Justice System/Courts': {0: 'A great deal',
1: 'Not very much',
2: 'Quite a lot',
3: 'Not very much',
4: 'Quite a lot'},
'Confidence: The Government': {0: 'A great deal',
1: 'None at all',
2: 'Not very much',
3: 'Not very much',
4: 'Quite a lot'}}
Upvotes: 2
Views: 1256
Reputation: 3670
Here is one way to plot the bar chart using pandas. I assume that you want to plot the counts of the strings for each column of your dataframe, in which case you first need to compute the counts. This can be done by first unpivoting the dataframe with .melt
and then computing a cross-tabulation with .crosstab
, assuming that each column contains the same categories. The following example uses the sample data
that you have shared and plots the counts with a horizontal bar chart to make the labels readable without any additional formatting:
import pandas as pd # v 1.1.3
df = pd.DataFrame(data)
dfmelted = df.melt()
dfmelted.head()
# variable value
# 0 Confidence: The Press A great deal
# 1 Confidence: The Press Not very much
# 2 Confidence: The Press None at all
# 3 Confidence: The Press Not very much
# 4 Confidence: The Press Not very much
ctab = pd.crosstab(index=dfmelted['variable'], columns=dfmelted['value'])
ctab
# value A great deal None at all Not very much Quite a lot
#
# variable
# Confidence: Justice System/Courts 1 0 2 2
# Confidence: Labor Unions 1 1 2 1
# Confidence: The Government 1 1 2 1
# Confidence: The Police 1 0 2 2
# Confidence: The Press 1 1 3 0
ctab.plot.barh(figsize=(6,8), xlabel='count');
Note that nan
values are ignored. If you want to include them in the plot, you need to convert them to strings.
Upvotes: 4