Reputation: 1250
I am visualizing the titanic dataset. I created 9 different age categories and was trying to visualize the age_categories vs Survived using a bar chart. I wrote the following piece of code:
age_cats = [1, 2, 3, 4, 5, 6, 7, 8, 9]
df_train['Age_Cats'] = pd.cut(df_train['Age'], 9, labels = age_cats)
sns.barplot(x = 'Age_Cats', y = 'Survived', hue = 'Sex', data = df_train)
I am not understanding what do the numbers on the Y-axis represent?
My assumption is:
{n(Survived = 1)}/{n(Survived = 1) + n(Survived = 0)} or the ratio of people survived out of all people in that category. But how is seaborn calculating it? Or do the numbers on the Y-axis represent anything else?
Upvotes: 0
Views: 479
Reputation: 339200
The bar plot shows the survival rate or percentage of people who survived.
E.g. in the age class 1 60% of all males survived. In the age class 7 less than 15% of all males survived.
This is calculated by taking the mean of the survival variable for that age class. E.g. if you had 3 people, 2 of which survived, this variable could look like [1,0,1]
, the mean of this array is (1+0+1)/3=0.66
; the bar plot would hence show a bar up to 0.66.
Upvotes: 1