Reputation: 2708
How can I plot a histogram from the dataframe listed below?
I would like to visualize the number of women for each education level based on Education
column.
Example print our output from below:
women in High Scool 30
women in College 33
women in Bachelor 14
What I tried
#show max rows and columns
pd.set_option('display.max_rows', 1000)
countFemales = myDataFrame['Gender'].str.contains("Female").sum()
#subset myDataFrame based on Gender's value, returns boolean series
isFemale = myDataFrame['Gender']=='Female'
#fileter dataframe based on boolean condition, extract female column as df
femaleDataframe = myDataFrame[isFemale]
# extract only unique values from female data: Bachelor, Colleage, High Scool..
femaleLevelOfEducation = femaleDataframe.Education.unique()
print("women in High Scool " + str(femaleDataframe["Education"].str.contains("High School or Below").sum()))
print("women in College " + str(femaleDataframe["Education"].str.contains("College").sum()))
print("women in Bachelor " + str(femaleDataframe["Education"].str.contains("Bachelor").sum()))
femaleDataframe.plot(x=femalLevelOfEducation, y=countFemales, kind='hist')
plt.show() //this is where I am stuck
Edit
If I do plt.bar(x=femaleLevelOfEducation, y=countFemales, height=60)
, I get the bar plot shown below. However, it does not make sense to me, since according to the print statements, in the dataset, there are:
women in High Scool 30
women in College 33
women in Bachelor 14
So now the question is, why is the y axis stretching to 140 and not to a maximum of 33?
Data set: https://drive.google.com/file/d/1Y8VdU1Y7jGR17vWDspm31PdL-d1BQlDg/view?usp=sharing
Upvotes: 2
Views: 94
Reputation: 332
You are getting incorrect count due to the sum() calculations.
However, for the problem you mentioned, groupby()
may be the best solution.
See below:
import pandas as pd
df = pd.DataFrame({
'gender':['F', 'F', 'F', 'M', 'F', 'F', 'F'],
'edu':['Bachelor', 'Masters','Bachelor','Bachelor','HighSchool','Doctor','Doctor'],
'age':[30,30,31,28,25,29,33]
})
# df.groupby(['Gender','Edu']).size().unstack().plot(kind='bar')
df[df['gender']=='F'].groupby(['gender', 'edu']).size().unstack().plot(kind='bar')
Dataframe Used:
gender edu age
0 F Bachelor 30
1 F Masters 30
2 F Bachelor 31
3 M Bachelor 28
4 F HighSchool 25
5 F Doctor 29
6 F Doctor 33
Upvotes: 1