Reputation: 86
I'm sure there's an easy answer to this and I'm just looking at things wrong, but what's going on with my pyplot histogram? Here's the output; the data contains participants between the ages of 18 and 24, with no fractional ages (nobody's 18.5):
Why are the bins staggered like this? The current width is set to 1, so each bar should be the width of a bin, right? The problem gets even worse when the width is less than 0.5, when the bars look like they're in completely different bins.
Here's the code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
csv = pd.read_csv('F:\Python\Delete\Delete.csv')
age = csv.age
gender = csv.gender
new_age = age[~np.isnan(age)]
new_age_f = new_age[gender==2]
new_age_m = new_age[gender==1]
plt.hist(new_age_f, alpha=.80, label='Female', width=1, align='left')
plt.hist(new_age_m, alpha=.80, label='Male', width=1, align='left')
plt.legend()
plt.show()
Thank you!
Upvotes: 3
Views: 4323
Reputation: 339705
plt.hist
does not have any argument width
. If width
is specified, it is given to the underlying patch, meaning that the rectangle is made 1
wide. This has nothing to do with the bin width of the histogram and I would guess there are little to no reasons to ever use width
in a histogram call at all.
Instead what you want is to specify the bins. You probably also want to use the same bins for both histogram plots.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(5)
import pandas as pd
csv = pd.DataFrame({"age" : np.random.randint(18,27, 20),
"gender" : np.random.randint(1,3,20)})
age = csv.age
gender = csv.gender
new_age = age[~np.isnan(age)]
new_age_f = new_age[gender==2]
new_age_m = new_age[gender==1]
bins = np.arange(new_age.values.min(),new_age.values.max()+2)
plt.hist(new_age_f, alpha=.40, label='Female', bins=bins, ec="k")
plt.hist(new_age_m, alpha=.40, label='Male', bins=bins, ec="k")
plt.legend()
plt.show()
Upvotes: 5