Cranjis
Cranjis

Reputation: 1960

How to plot multiple lines with error bars

I have a dataframe:

df =
f1.  f2.  f3.  f4.  f5.  g
1.    2.  3.   4.   1.   0 
2.    4.  6.   8.   7.   0
1.    2.  3.   6.   1.   1 
5.    4.  6.   8.   7.   1
9.    2.  7.   5.   1.   0 
8.    4.  2.   4.   5.   1

I want to draw a lineplot with error bands, where every row is another sample, hue is dictated by the column g, the values are the numbers and the X-axis are the columns (f1 , f2 , f3 , f4 , f5) Is that possible?

Upvotes: 0

Views: 2130

Answers (2)

David Erickson
David Erickson

Reputation: 16683

Quite often for these problems, you need to transform your dataframe into a long structure with .melt():

import pandas as pd
import seaborn as sns
df1 = df.melt(id_vars='g')
sns.lineplot(data=df1, x='variable', y='value', hue='g')
df1

Out[1]: 
    g variable  value
0   0      f1.    1.0
1   0      f1.    2.0
2   1      f1.    1.0
3   1      f1.    5.0
4   0      f1.    9.0
5   1      f1.    8.0
6   0      f2.    2.0
7   0      f2.    4.0
8   1      f2.    2.0
9   1      f2.    4.0
10  0      f2.    2.0
11  1      f2.    4.0
12  0      f3.    3.0
13  0      f3.    6.0
14  1      f3.    3.0
15  1      f3.    6.0
16  0      f3.    7.0
17  1      f3.    2.0
18  0      f4.    4.0
19  0      f4.    8.0
20  1      f4.    6.0
21  1      f4.    8.0
22  0      f4.    5.0
23  1      f4.    4.0
24  0      f5.    1.0
25  0      f5.    7.0
26  1      f5.    1.0
27  1      f5.    7.0
28  0      f5.    1.0
29  1      f5.    5.0

enter image description here

Upvotes: 4

Trenton McKinney
Trenton McKinney

Reputation: 62403

  • Reshape the dataframe from a wide to long format using pandas.DataFrame.melt
  • Plot the data with seaborn.pointplot
    • A point plot represents an estimate of central tendency for a numeric variable by the position of scatter plot points and provides some indication of the uncertainty around that estimate using error bars.
      • The point drawn with be the mean, or some other specified estimator.
      • The bars from each point will be from the min to the max if ci is not specified. Use ci='sd' for the bars to represent the standard deviation.
    • Specify hue='g' to separate the data by 'g'.
    • Use dodge to separate the colors at each point for readability.
import pandas as pd
import seaborn as sns

# sample data
data = {'f1.': [1.0, 2.0, 1.0, 5.0, 9.0, 8.0], 'f2.': [2.0, 4.0, 2.0, 4.0, 2.0, 4.0], 'f3.': [3.0, 6.0, 3.0, 6.0, 7.0, 2.0], 'f4.': [4.0, 8.0, 6.0, 8.0, 5.0, 4.0], 'f5.': [1.0, 7.0, 1.0, 7.0, 1.0, 5.0], 'g': [0, 0, 1, 1, 0, 1]}
df = pd.DataFrame(data)

# reshape the dataframe
dfm = df.melt(id_vars='g')

# plot
p = sns.pointplot(data=dfm, x='variable', y='value', hue='g', ci='sd', dodge=0.25)
p.set_title('Error bars are standard deviation')
p.legend(title='g', bbox_to_anchor=(1.05, 1), loc='upper left')

enter image description here

p = sns.pointplot(data=dfm, x='variable', y='value', hue='g', dodge=0.25)
p.set_title('Error bars are min to max')
p.legend(title='g', bbox_to_anchor=(1.05, 1), loc='upper left')

enter image description here

Upvotes: 4

Related Questions