bibscy
bibscy

Reputation: 2708

how to plot dataframe based on count of obervations?

How can I plot a histogram from the dataframe listed below? I would like to visualize the number of women for each education level based on Education column.

Example print our output from below:

women in High Scool 30
women in College 33
women in Bachelor 14

What I tried

#show max rows and columns
pd.set_option('display.max_rows', 1000)

countFemales = myDataFrame['Gender'].str.contains("Female").sum()

#subset myDataFrame based on Gender's value, returns boolean series
isFemale = myDataFrame['Gender']=='Female'

#fileter dataframe based on boolean condition, extract female column as df
femaleDataframe = myDataFrame[isFemale]

# extract only unique values from female data: Bachelor, Colleage, High Scool..
femaleLevelOfEducation = femaleDataframe.Education.unique()

print("women  in High Scool " + str(femaleDataframe["Education"].str.contains("High School or Below").sum()))
print("women  in   College " + str(femaleDataframe["Education"].str.contains("College").sum()))
print("women  in   Bachelor  " + str(femaleDataframe["Education"].str.contains("Bachelor").sum()))

 femaleDataframe.plot(x=femalLevelOfEducation, y=countFemales, kind='hist') 
 plt.show() //this is where I am stuck

csv file

Edit

If I do plt.bar(x=femaleLevelOfEducation, y=countFemales, height=60), I get the bar plot shown below. However, it does not make sense to me, since according to the print statements, in the dataset, there are: women in High Scool 30
women in College 33
women in Bachelor 14

So now the question is, why is the y axis stretching to 140 and not to a maximum of 33?

enter image description here

Data set: https://drive.google.com/file/d/1Y8VdU1Y7jGR17vWDspm31PdL-d1BQlDg/view?usp=sharing

Upvotes: 2

Views: 94

Answers (1)

Jinto Lonappan
Jinto Lonappan

Reputation: 332

You are getting incorrect count due to the sum() calculations. However, for the problem you mentioned, groupby() may be the best solution.

See below:

import pandas as pd
df = pd.DataFrame({
  'gender':['F', 'F', 'F', 'M', 'F', 'F', 'F'],
  'edu':['Bachelor', 'Masters','Bachelor','Bachelor','HighSchool','Doctor','Doctor'],
  'age':[30,30,31,28,25,29,33]
})
# df.groupby(['Gender','Edu']).size().unstack().plot(kind='bar')
df[df['gender']=='F'].groupby(['gender', 'edu']).size().unstack().plot(kind='bar')

Output: df_plot_groupby

Dataframe Used:

  gender         edu  age
0      F    Bachelor   30
1      F     Masters   30
2      F    Bachelor   31
3      M    Bachelor   28
4      F  HighSchool   25
5      F      Doctor   29
6      F      Doctor   33

Upvotes: 1

Related Questions