Ahmad Anis
Ahmad Anis

Reputation: 2724

Multiple Columns for HUE parameter in Seaborn violinplot

I am working with tips data set, and here is the head of data set.


 total_bill tip     sex    smoker day time  size
0   16.99   1.01    Female  No  Sun Dinner  2
1   10.34   1.66    Male    No  Sun Dinner  3
2   21.01   3.50    Male    No  Sun Dinner  3
3   23.68   3.31    Male    No  Sun Dinner  2
4   24.59   3.61    Female  No  Sun Dinner  4

My code is

sns.violinplot(x='day',y='total_bill',data=tips, hue=['sex','smoker'])

I want a violinplot of day with total_bill in which hue is sex and smoker, but I can not find any option to set multiple values of hue. Is there any way?

Upvotes: 20

Views: 21248

Answers (3)

Trenton McKinney
Trenton McKinney

Reputation: 62553

  • An option, that creates a new column, similar to the answer from dlukes.
  • Create a column of combined strings from either 'day'/'sex' or 'day'/'smoker', set that as x=, use 'smoker' or 'sex', respectively, as hue=, and set split=True.
  • Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# load sample data
tips = sns.load_dataset("tips")

# create a new column
tips['Day - Sex'] = tips.day.astype(str) + ' - ' + tips.sex.astype(str)

# set to categorical to specify an order
categories = ['Thur - Female', 'Thur - Male', 'Fri - Female', 'Fri - Male', 'Sat - Female', 'Sat - Male', 'Sun - Female', 'Sun - Male']
tips['Day - Sex'] = pd.Categorical(tips['Day - Sex'], categories=categories, ordered=True)

# plot
fig, ax = plt.subplots(figsize=(12, 6))
sns.violinplot(x='Day - Sex', y='total_bill', data=tips, hue='smoker', ax=ax, split=True)

enter image description here

Upvotes: 2

dlukes
dlukes

Reputation: 1793

The faceting approach suggested by the accepted answer is probably nicer in this case, but might not be easily applicable to other kinds of Seaborn plots (e.g. in my case, ecdfplot). So I just wanted to share that I figured out a solution which does what OP originally asked, i.e. actually use multiple columns for the hue parameter.

The trick is that hue can either be a column name, or a sequence of the same length as your data, listing the color categories to assign each data point to. So...

sns.violinplot(x='day', y='total_bill', data=tips, hue='sex')

... is basically the same as:

sns.violinplot(x='day', y='total_bill', data=tips, hue=tips['sex'])

You typically wouldn't use the latter, it's just more typing to achieve the same thing -- unless you want to construct a custom sequence on the fly:

sns.violinplot(x='day', y='total_bill', data=tips,
               hue=tips[['sex', 'smoker']].apply(tuple, axis=1))

Violin plot using two columns for hue parameter

The way you build the sequence that you pass via hue is entirely up to you, the only requirement is that it must have the same length as your data, and if an array-like, it must be one-dimensional, so you can't just pass hue=tips[['sex', 'smoker']], you have to somehow concatenate the columns into one. I chose tuple as the most versatile way, but if you want to have more control over the formatting, build a Series of strings (saving it into a separate variable here for better readability, but you don't have to):

hue = tips['sex'].astype(str) + ', ' + tips['smoker'].astype(str)
sns.violinplot(x='day', y='total_bill', data=tips, hue=hue)

enter image description here

Upvotes: 29

Zephyr
Zephyr

Reputation: 12524

You could use a seaborn.catplot in order to use 'sex' as hue and 'smoker' as column for generating two side by side violinplot.
Check this code:

import seaborn as sns
import matplotlib.pyplot as plt
sns.set()

tips = sns.load_dataset("tips")

sns.catplot(x = "day",
            y = "total_bill",
            hue = "sex",
            col = "smoker",
            data = tips,
            kind = "violin",
            split = True)

plt.show()

which gives me this plot:

enter image description here

Upvotes: 7

Related Questions