Reputation: 55
I'm trying to overlay a seaborn lineplot over a seaborn boxplot The result is someway "shocking" :) It seems like the two graphs are put in the same figure but separate The box plot is compressed on the left side, the line plot is compressed on the right side
Notice that if I run the two graph separatly they work fine I cannot fugure out how to make it work Thank you in advance for any help
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
mydata = pd.DataFrame({
'a':[2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2014, 2014, 2014, 2014, 2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2020, 2020, 2020, 2020],
'v':[383.00, 519.00, 366.00, 436.00, 1348.00, 211.00, 139.00, 614.00, 365.00, 365.00, 383.00, 602.00, 994.00, 719.00, 589.00, 365.00, 990.00, 1142.00, 262.00, 1263.00, 507.00, 222.00, 363.00, 274.00, 195.00, 730.00, 730.00, 592.00, 479.00, 607.00, 292.00, 657.00, 453.00, 691.00, 673.00, 705]
})
means =mydata.groupby('a').v.mean().reset_index()
fig, ax = plt.subplots(figsize=(15,8))
sns.boxplot(data=mydata, x='a', y='v', ax=ax, showfliers=False)
sns.lineplot(data=means, x='a', y='v', ax=ax)
plt.show()
Upvotes: 3
Views: 2855
Reputation: 12410
Surprisingly, I did not find a duplicate for this question with a good answer, so I elevate my comment to one. Arise, Sir Comment:
Instead of lineplot
, you should use pointplot
...
sns.boxplot(data=mydata, x='a', y='v', ax=ax, showfliers=False)
sns.pointplot(data=means, x='a', y='v', ax=ax)
plt.show()
Sample output:
Pointplot is the equivalent to lineplot
for categorical variables that are used for boxplot
. Please read here more about relational and categorical plotting.
The question came up why there is no problem with lineplot
for the following data:
mydata = pd.DataFrame({'a':["m1", "m1", "m1", "m2", "m2", "m2", "m2", "m3", "m3", "m3", "m3", "m4", "m4", "m4", "m4"], 'v':[11.37, 11.31, 10.93, 9.43, 9.62, 6.61, 9.31, 11.27, 8.47, 11.86, 8.77, 8.8, 9.58, 12.26, 10] })
means =mydata.groupby('a').v.mean().reset_index()
print(means)
fig, ax = plt.subplots(figsize=(15,8))
sns.boxplot(data=mydata, x='a', y='v', ax=ax, showfliers=False)
sns.lineplot(data=means, x='a', y='v', ax=ax)
plt.show()
The difference is that this example does not have any ambiguity for lineplot
. Seaborn lineplot
can use both - categorical and numerical data. Seemingly, the code tries first to plot it as numerical data, and if this is not possible uses them as categorical variables (I don't know the source code). This is probably a good software decision by seaborn because the other case (not accepting categorical data) would cause way more problems than the rare case that people try to plot both categorical and numerical data into the same figure. A warning by seaborn would be a good thing, though.
Upvotes: 3