an10b3
an10b3

Reputation: 273

How to overlay data points on seaborn figure-level boxplots

I have the following dataframe and plot below - and I want to add data points for each box plot in the factorplot but I am having trouble combining box and strip plots in to the same graph (i.e. they dont overlay, they appear below each other). Is there a solution for this?

import pandas as pd
import datetime

idx = pd.date_range('01-01-2020', '01-25-2020')

d = pd.Series({'01-01-2020': 1,
               '01-25-2020': 1})

d.index = pd.DatetimeIndex(d.index)

d = d.reindex(idx, fill_value=0)
df = pd.DataFrame(d).rename_axis("dt").reset_index()
df.drop(columns=df.columns[1], inplace=True)

# calculate week number 
df["week"] = df["dt"].dt.week

# create column counts for 'nhsbt centres'
df["A"] = np.random.randint(0, 50, df.shape[0])
df["B"] = np.random.randint(0, 30, df.shape[0])
df["C"] = np.random.randint(0, 20, df.shape[0])

# melt dataframe 
df1 = df[["A", "B", "C", "week"]]
df1 = df1.set_index("week")
df1 = df1.melt(ignore_index=False)
df1["week"] = df1.index

# make boxplot
sns.factorplot("week", "value", col="variable", data=df1, kind="box")

Upvotes: 2

Views: 3879

Answers (2)

Trenton McKinney
Trenton McKinney

Reputation: 62513

import seaborn as sns

# make boxplot with data from the OP
g = sns.catplot(x="week", y="value", col="variable", data=df1, kind="box")
g.map(sns.swarmplot, 'week', 'value', color='k', order=sorted(df1.week.unique()))

enter image description here

Upvotes: 2

Zephyr
Zephyr

Reputation: 12506

If you have a dataframe like this:

import pandas as pd
import matplotlib.pyplot as plt
import  seaborn as sns
import numpy as np


idx = pd.date_range('01-01-2020', '01-25-2020')

d = pd.Series({'01-01-2020': 1,
               '01-25-2020': 1})

d.index = pd.DatetimeIndex(d.index)

d = d.reindex(idx, fill_value=0)
df = pd.DataFrame(d).rename_axis("dt").reset_index()
df.drop(columns=df.columns[1], inplace=True)

df["week"] = df["dt"].dt.week
df["groupA"] = np.random.randint(0, 50, df.shape[0])
df["groupB"] = np.random.randint(0, 30, df.shape[0])
df["groupC"] = np.random.randint(0, 20, df.shape[0])

df1 = df[["groupA", "groupB", "groupC", "week"]]
df1 = df1.set_index("week")
df1 = df1.melt(ignore_index=False)
df1["week"] = df1.index
     variable  value  week
week                      
1      groupA     24     1
1      groupA     30     1
1      groupA     38     1
1      groupA     41     1
1      groupA     42     1
2      groupA     47     2
2      groupA      9     2
2      groupA     16     2
2      groupA     24     2
2      groupA      3     2
2      groupA     27     2
2      groupA     48     2
3      groupA     46     3
3      groupA     29     3
3      groupA      2     3
3      groupA     46     3
3      groupA     48     3
3      groupA     26     3
3      groupA     36     3
4      groupA     48     4
4      groupA     38     4
4      groupA     19     4
4      groupA     13     4
4      groupA     38     4
4      groupA     34     4
1      groupB     11     1
1      groupB     15     1
1      groupB     14     1
1      groupB     29     1
1      groupB      6     1
2      groupB     20     2
2      groupB     14     2
2      groupB     26     2
2      groupB     11     2
2      groupB     14     2
2      groupB      0     2
2      groupB     11     2
3      groupB     20     3
3      groupB     17     3
3      groupB     16     3
3      groupB     24     3
3      groupB     24     3
3      groupB     16     3
3      groupB     22     3
4      groupB     10     4
4      groupB     26     4
4      groupB      3     4
4      groupB      7     4
4      groupB     16     4
4      groupB     18     4
1      groupC     12     1
1      groupC      4     1
1      groupC      1     1
1      groupC      9     1
1      groupC     16     1
2      groupC      6     2
2      groupC     12     2
2      groupC      6     2
2      groupC     14     2
2      groupC      2     2
2      groupC     18     2
2      groupC     10     2
3      groupC     13     3
3      groupC     11     3
3      groupC     15     3
3      groupC      9     3
3      groupC     18     3
3      groupC      7     3
3      groupC      4     3
4      groupC      8     4
4      groupC     13     4
4      groupC      3     4
4      groupC      1     4
4      groupC      5     4
4      groupC      4     4

Then you could combine seaborn.boxplot and seaborn.swarmplot or seaborn.stripplot with a for loop:

fig, ax = plt.subplots(1, len(df1['variable'].unique()), figsize = (15, 5))

for i, group in enumerate(df1['variable'].unique(), 0):
    sns.boxplot(ax = ax[i], data = df1[df1['variable'] == group], x = 'week', y = 'value')
    sns.swarmplot(ax = ax[i], data = df1[df1['variable'] == group], x = 'week', y = 'value', color = 'black')
    ax[i].set_title(group)

plt.show()
seaborn function plot
seaborn.swarmplot enter image description here
seaborn.stripplot enter image description here

NOTES

As far as I know, you cannot use seaborn.factorplot (now seaborn.catplot), since it is a figure-level function, so it does not allow you combine multiple plot in the same figure.

Upvotes: 1

Related Questions