dude
dude

Reputation: 83

Seaborn Lineplot show custom central line instead of mean

I have a dataset that is consisting of three values for each timestep: Once the mean value as well as the lower and upper error bound.

name,year,area
test,2017,1.0376800009967053 #mean
test,2017,0.09936810445983806 #lower bound
test,2017,2.118230806622908 #upper bound and so on ...
test,2018,1.0
test,2018,0.13705391957353763
test,2018,2.1881023056535183
test,2019,1.2928531655977922
test,2019,0.17400072775054737
test,2019,3.016064939443665

I would like to plot the data so that I get a shaded area between the upper and the lower bound and have a line in between that follows the mean value in the dataset.

I have tried seaborn.lineplot (https://seaborn.pydata.org/examples/errorband_lineplots.html) however it calculates the mean of the tree values and so the line is not where the actual mean should be. Any body got some ideas? Is it possible to change the way seaborn calculates the central line? (for example to median)

Upvotes: 3

Views: 2489

Answers (1)

gehbiszumeis
gehbiszumeis

Reputation: 3711

You can use the estimator keyword of seaborn.lineplot. In the documentation you find regarding this:

estimator : name of pandas method or callable or None, optional

Method for aggregating across multiple observations of the y variable at the same x level. If None, all observations will be drawn.

Default value for estimator is mean, which explains your observation decribed in the question. So you can define a lambda function selecting always the first value of the three values of the same year.

lambda x: x[0]

Using

import seaborn as sns
sns.lineplot(x='year', y='area', data=df, estimator=lambda x: x[0], marker='o')

gives the plot you want.

enter image description here

If you want to have the median instead, import numpy as np before and use estimator=np.median.

Upvotes: 1

Related Questions