Reputation: 83
I have a dataset that is consisting of three values for each timestep: Once the mean value as well as the lower and upper error bound.
name,year,area
test,2017,1.0376800009967053 #mean
test,2017,0.09936810445983806 #lower bound
test,2017,2.118230806622908 #upper bound and so on ...
test,2018,1.0
test,2018,0.13705391957353763
test,2018,2.1881023056535183
test,2019,1.2928531655977922
test,2019,0.17400072775054737
test,2019,3.016064939443665
I would like to plot the data so that I get a shaded area between the upper and the lower bound and have a line in between that follows the mean value in the dataset.
I have tried seaborn.lineplot (https://seaborn.pydata.org/examples/errorband_lineplots.html) however it calculates the mean of the tree values and so the line is not where the actual mean should be. Any body got some ideas? Is it possible to change the way seaborn calculates the central line? (for example to median)
Upvotes: 3
Views: 2489
Reputation: 3711
You can use the estimator
keyword of seaborn.lineplot
. In the documentation you find regarding this:
estimator : name of pandas method or callable or
None
, optionalMethod for aggregating across multiple observations of the
y
variable at the samex
level. IfNone
, all observations will be drawn.
Default value for estimator
is mean
, which explains your observation decribed in the question. So you can define a lambda
function selecting always the first value of the three values of the same year
.
lambda x: x[0]
Using
import seaborn as sns
sns.lineplot(x='year', y='area', data=df, estimator=lambda x: x[0], marker='o')
gives the plot you want.
If you want to have the median
instead, import numpy as np
before and use estimator=np.median
.
Upvotes: 1