Reputation: 135
I am plotting a point plot to show the relationship between "workclass", "sex", "occupation" and "Income exceed 50K or not". However, the result is a mess. The legends are stick together, Female and Male are both shown in blue colors in the legend etc.
#Co-relate categorical features
grid = sns.FacetGrid(train, row='occupation', size=6, aspect=1.6)
grid.map(sns.pointplot, 'workclass', 'exceeds50K', 'sex', palette='deep', markers = ["o", "x"] )
grid.add_legend()
Please advise how to fit the size of the plot. Thanks!
Upvotes: 0
Views: 332
Reputation: 46898
It sounds like 'exceeds50k' is a categorical variable. Your y variable needs to be continuous for a point plot. So assuming this is your dataset:
import pandas as pd
import seaborn as sns
df =pd.read_csv("https://raw.githubusercontent.com/katreparitosh/Income-Predictor-Model/master/Database/adult.csv")
We simplify some categories to plot for example sake:
df['native.country'] = [i if i == 'United-States' else 'others' for i in df['native.country'] ]
df['race'] = [i if i == 'White' else 'others' for i in df['race'] ]
df.head()
age workclass fnlwgt education education.num marital.status occupation relationship race sex capital.gain capital.loss hours.per.week native.country income
0 90 ? 77053 HS-grad 9 Widowed ? Not-in-family White Female 0 4356 40 United-States <=50K
1 82 Private 132870 HS-grad 9 Widowed Exec-managerial Not-in-family White Female 0 4356 18 United
If the y variable is categorical, you might want to use a barplot:
sns.catplot(hue='income',x='sex', palette='deep',data=df,
col='native.country',
row='race',kind='count',height=3,aspect=1.6)
If it is continuous, for example age, you can see it works:
grid = sns.FacetGrid(df, row='race', height=3, aspect=1.6)
grid.map(sns.pointplot, 'native.country', 'age', 'sex', palette='deep', markers = ["o", "x"] )
grid.add_legend()
Upvotes: 1