user3446276
user3446276

Reputation: 55

Seaborn boxplot and lineplot not showing properly

I'm trying to overlay a seaborn lineplot over a seaborn boxplot The result is someway "shocking" :) It seems like the two graphs are put in the same figure but separate The box plot is compressed on the left side, the line plot is compressed on the right side

Notice that if I run the two graph separatly they work fine I cannot fugure out how to make it work Thank you in advance for any help

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

mydata = pd.DataFrame({
    'a':[2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2014, 2014, 2014, 2014, 2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2020, 2020, 2020, 2020],
    'v':[383.00, 519.00, 366.00, 436.00, 1348.00, 211.00, 139.00, 614.00, 365.00, 365.00, 383.00, 602.00, 994.00, 719.00, 589.00, 365.00, 990.00, 1142.00, 262.00, 1263.00, 507.00, 222.00, 363.00, 274.00, 195.00, 730.00, 730.00, 592.00, 479.00, 607.00, 292.00, 657.00, 453.00, 691.00, 673.00, 705]
})

means =mydata.groupby('a').v.mean().reset_index()

fig, ax = plt.subplots(figsize=(15,8))
sns.boxplot(data=mydata, x='a', y='v', ax=ax, showfliers=False)
sns.lineplot(data=means, x='a', y='v', ax=ax)
plt.show()

Upvotes: 3

Views: 2855

Answers (1)

Mr. T
Mr. T

Reputation: 12410

Surprisingly, I did not find a duplicate for this question with a good answer, so I elevate my comment to one. Arise, Sir Comment:

Instead of lineplot, you should use pointplot

...
sns.boxplot(data=mydata, x='a', y='v', ax=ax, showfliers=False)
sns.pointplot(data=means, x='a', y='v', ax=ax) 
plt.show()

Sample output:

enter image description here

Pointplot is the equivalent to lineplot for categorical variables that are used for boxplot. Please read here more about relational and categorical plotting.

The question came up why there is no problem with lineplot for the following data:

mydata = pd.DataFrame({'a':["m1", "m1", "m1", "m2", "m2", "m2", "m2", "m3", "m3", "m3", "m3", "m4", "m4", "m4", "m4"],     'v':[11.37, 11.31, 10.93, 9.43, 9.62, 6.61, 9.31, 11.27, 8.47, 11.86, 8.77, 8.8, 9.58, 12.26, 10] })  
means =mydata.groupby('a').v.mean().reset_index()  
print(means)
fig, ax = plt.subplots(figsize=(15,8)) 
sns.boxplot(data=mydata, x='a', y='v', ax=ax, showfliers=False) 
sns.lineplot(data=means, x='a', y='v', ax=ax) 
plt.show()

Output: enter image description here

The difference is that this example does not have any ambiguity for lineplot. Seaborn lineplot can use both - categorical and numerical data. Seemingly, the code tries first to plot it as numerical data, and if this is not possible uses them as categorical variables (I don't know the source code). This is probably a good software decision by seaborn because the other case (not accepting categorical data) would cause way more problems than the rare case that people try to plot both categorical and numerical data into the same figure. A warning by seaborn would be a good thing, though.

Upvotes: 3

Related Questions