Reputation: 283
I have following code:
import seaborn as sns
import pandas as pd
import os
sns.set_theme(style="whitegrid")
df = pd.read_csv("C:/tmp/all.csv")
sns.boxplot(x="cluster", y="val",
hue="type", palette=["m", "g"],
data=df)
sns.despine(offset=10, trim=True)
My CSV is:
index, cluster, type, val
1, 0-10, male, 1
2, 30-40, female, 5
3, 30-40, male, 3
4, 50-60, male, 7
5, 50-60, female, 1
...
The max value of val is 10.
My output is:
But what I want is: o boxplot of values in a grouped way. In my output I'm getting the number of counts for each cluster. The maximum val is actually 10. What am I doing wrong?
Upvotes: 1
Views: 94
Reputation: 517
It seems that there are significant outliers beyond your described upper limit of 10. This can be seen visually in the figure, as well as in the table submitted in comments.
limit y scope - quick and dirty approach #1
Set the limit y manually like so:
import matplotlib.pyplot as plt
plt.ylim([-1,11])
In your code:
import matplotlib.pyplot as plt # <--- import here
import seaborn as sns
import pandas as pd
import os
sns.set_theme(style="whitegrid")
df = pd.read_csv("C:/tmp/all.csv")
sns.boxplot(x="cluster", y="val",
hue="type", palette=["m", "g"],
data=df)
sns.despine(offset=10, trim=True)
plt.ylim([-1,11]) # <--- limit y here
Not showing outliers - quick and dirty approach #2
Change setting of sns.boxplot()
sns.boxplot(showfliers = False)
Both approaches would concentrate the graph on the inter quartile information. I would prefer approach #1 since it does not remove the outliers, but #2 does not need manual configuration.
Upvotes: 1