Histogram of a categorical variable with matplotlib

I have a column in a pandas dataframe that has three possible categorical values. When I try to plot it using plt.hist(data['column']) from matplotlib, the histogram bars are not aligned with the x-axis ticks, and they're not evenly spaced. How can I fix this?

enter image description here

Upvotes: 5

Views: 23226

Answers (3)

Patrick FitzGerald
Patrick FitzGerald

Reputation: 3670

Histograms are used to plot the frequency distribution of numerical variables (continuous or discrete). The frequency distribution of categorical variables is best displayed with bar charts. For this, you first need to compute the frequency of each category with value_counts and then you can conveniently plot that directly with pandas plot.bar. Or else with matplotlib if you prefer, as shown below.

import numpy as np               # v 1.19.2
import pandas as pd              # v 1.2.3
import matplotlib.pyplot as plt  # v 3.3.4

data = pd.DataFrame(dict(column=np.repeat(['F', 'M', '--'], [11000, 13000, 3000])))

pandas

data['column'].value_counts().plot.bar(rot=0)

pandas_bar


matplotlib

categories = data['column'].value_counts().index
counts = data['column'].value_counts().values
plt.bar(categories, counts, width=0.5)

matplotlib_bar

Upvotes: 9

gboffi
gboffi

Reputation: 25100

You have to tell Matplotlib how many bins are needed (by default, 10 bins are ALWAYS used in a hist), and you have to specify the position of the labels, taking into account that the x-axis run from 0 to the number of bins minus one (in my example below, from 0 to 2)

from numpy import array, linspace ; from numpy.random import randint
from matplotlib.pyplot import hist, xticks, show

# synthesize some data 
x = array([{0:'A',1:'B',2:'B',3:'B',4:'C',5:'C'}[n] for n in randint(0, 6, 20000)])
nc = len(set(x)) # how many categories
hist(x, bins=nc, rwidth=0.7)
xticks(linspace(0, nc-1, 2*nc+1)[1::2])
show()

enter image description here

Upvotes: 3

Michiel
Michiel

Reputation: 94

The parameter rwidth specifies the width of your bar relative to the width of your bin. For example, if your bin width is said 1 and rwidth=0.5, the bar width will be 0.5. On both sides of the bar, you will have a space of 0.25.

Mind: this gives a space of 0.5 between consecutive bars. With the number of bins you have, you won't see these spaces. But with fewer bins, they do show up.

Upvotes: 0

Related Questions