Inflorescence
Inflorescence

Reputation: 86

How to stop pyplot from overlapping histogram bins?

I'm sure there's an easy answer to this and I'm just looking at things wrong, but what's going on with my pyplot histogram? Here's the output; the data contains participants between the ages of 18 and 24, with no fractional ages (nobody's 18.5):

pyplot histogram with overlapping bins

Why are the bins staggered like this? The current width is set to 1, so each bar should be the width of a bin, right? The problem gets even worse when the width is less than 0.5, when the bars look like they're in completely different bins.

Here's the code:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

csv = pd.read_csv('F:\Python\Delete\Delete.csv')

age = csv.age
gender = csv.gender

new_age = age[~np.isnan(age)]
new_age_f = new_age[gender==2]
new_age_m = new_age[gender==1]

plt.hist(new_age_f, alpha=.80, label='Female', width=1, align='left')
plt.hist(new_age_m, alpha=.80, label='Male', width=1, align='left')

plt.legend()

plt.show()

Thank you!

Upvotes: 3

Views: 4323

Answers (1)

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339705

plt.hist does not have any argument width. If width is specified, it is given to the underlying patch, meaning that the rectangle is made 1 wide. This has nothing to do with the bin width of the histogram and I would guess there are little to no reasons to ever use width in a histogram call at all.

Instead what you want is to specify the bins. You probably also want to use the same bins for both histogram plots.

import matplotlib.pyplot as plt
import numpy as np; np.random.seed(5)
import pandas as pd

csv = pd.DataFrame({"age" : np.random.randint(18,27, 20),
                    "gender" : np.random.randint(1,3,20)})

age = csv.age
gender = csv.gender

new_age = age[~np.isnan(age)]
new_age_f = new_age[gender==2]
new_age_m = new_age[gender==1]

bins = np.arange(new_age.values.min(),new_age.values.max()+2)

plt.hist(new_age_f, alpha=.40, label='Female', bins=bins, ec="k")
plt.hist(new_age_m, alpha=.40, label='Male', bins=bins,  ec="k")

plt.legend()

plt.show()

enter image description here

Upvotes: 5

Related Questions