Reputation: 1587
I'm trying to create histogram from grouped data in pandas.
So far I was able to create standard line plot. But I can't figure out how to do the same to get histogram (bar chart). I would like to get 2 age histograms of persons who survived Titanic crush and who didn't - to see if there is a difference in age distribution.
Source data: https://www.udacity.com/api/nodes/5454512672/supplemental_media/titanic-datacsv/download
So far my code:
import pandas as pn
titanic = pn.DataFrame.from_csv('titanic_data.csv')
SurvivedAge= titanic.groupby(['Survived','Age']).size()
SurvivedAge=SurvivedAge.reset_index()
SurvivedAge.columns=['Survived', 'Age', 'Num']
SurvivedAge.index=(SurvivedAge['Survived'])
del SurvivedAge['Survived']
SurvivedAget=SurvivedAge.reset_index().pivot('Age', 'Survived','Num')
SurvivedAget.plot()
when I'm trying to plot a histogram from this data set I'm getting strange results.
SurvivedAget.hist()
I would be grateful for help with that.
Upvotes: 4
Views: 2818
Reputation: 42875
You can:
titanic = pd.read_csv('titanic_data.csv')
survival_by_age = titanic.groupby(['Age', 'Survived']).size().unstack('Survived')
survival_by_age.columns = ['No', 'Yes']
survival_by_age.plot.bar(title='Survival by Age')
to get:
which you can further tweak. You could also consolidate the fractional ages so you can use integer indices, or bin the data into say 5yr age spans to get more user-friendly output. And then there is seaborn with a various types of distribution plots.
Upvotes: 3