Zythyr
Zythyr

Reputation: 1212

How can I specify multiple variables for the hue parameters when plotting with seaborn?

When using seaborn, is there a way I can include multiple variables (columns) for the hue parameter? Another way to ask this question would be how can I group my data by multiple variables before plotting them on a single x,y axis plot?

I want to do something like below. However currently I am not able to specify two variables for the hue parameter.:

sns.relplot(x='#', y='Attack', hue=['Legendary', 'Stage'], data=df)

For example, assume I have a pandas DataFrame like below containing an a Pokemon database obtained via this tutorial.

enter image description here

I want to plot on the x-axis the pokedex #, and the y-axis the Attack. However, I want to data to be grouped by both Stage and Legendary. Using matplotlib, I wrote a custom function that groups the dataframe by ['Legendary','Stage'], and then iterates through each group for the plotting (see results below). Although my custom function works as intended, I was hoping this can be achieved simply by seaborn. I am guessing there must be other people what have attempted to visualize more than 3 variables in a single plot using seaborn?

fig, ax = plt.subplots()
grouping_variables = ['Stage','Legendary']
group_1 = df.groupby(grouping_variables)
for group_1_label, group_1_df in group_1:
    ax.scatter(group_1_df['#'], group_1_df['Attack'], label=group_1_label)
ax_legend = ax.legend(title=grouping_variables)    

enter image description here

Edit 1:

Note: In the example I provided, I grouped the data by obly two variables (ex: Legendary and Stage). However, other situations may require arbitrary number of variables (ex: 5 variables).

Upvotes: 13

Views: 33513

Answers (3)

dlukes
dlukes

Reputation: 1793

You can leverage the fact that hue accepts either a column name, or a sequence of the same length as your data, listing the color categories to assign each data point to. So...

sns.relplot(x='#', y='Attack', hue='Stage', data=df)

... is basically the same as:

sns.relplot(x='#', y='Attack', hue=df['Stage'], data=df)

You typically wouldn't use the latter, it's just more typing to achieve the same thing -- unless you want to construct a custom sequence on the fly:

sns.relplot(x='#', y='Attack', data=df,
            hue=df[['Legendary', 'Stage']].apply(tuple, axis=1))

Seaborn plot using two columns for hue

The way you build the sequence that you pass via hue is entirely up to you, the only requirement is that it must have the same length as your data, and if an array-like, it must be one-dimensional, so you can't just pass hue=df[['Legendary', 'Stage']], you have to somehow concatenate the columns into one. I chose tuple as the simplest and most versatile way, but if you want to have more control over the formatting, build a Series of strings. I'll save it into a separate variable here for better readability and so that I can assign it a name (which will be used as the legend title), but you don't have to:

hue = df[['Legendary', 'Stage']].apply(
    lambda row: f"{row.Legendary}, {row.Stage}", axis=1)
hue.name = 'Legendary, Stage'
sns.relplot(x='#', y='Attack', hue=hue, data=df)

Seaborn plot using two columns for hue, fancier version

Upvotes: 17

Parfait
Parfait

Reputation: 107767

To use hue of seaborn.relplot, consider concatenating the needed groups into a single column and then run the plot on new variable:

def run_plot(df, flds):
   # CREATE NEW COLUMN OF CONCATENATED VALUES
   df['_'.join(flds)] =  pd.Series(df.reindex(flds, axis='columns')
                                     .astype('str')
                                     .values.tolist()
                                  ).str.join('_')

   # PLOT WITH hue
   sns.relplot(x='#', y='Attack', hue='_'.join(flds), data=random_df, aspect=1.5)
   plt.show()

   plt.clf()
   plt.close()

To demonstrate with random data

Data

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

### DATA
np.random.seed(22320)
random_df = pd.DataFrame({'#': np.arange(1,501),
                          'Name': np.random.choice(['Bulbasaur', 'Ivysaur', 'Venusaur', 
                                                    'Charmander', 'Charmeleon'], 500),
                          'HP': np.random.randint(1, 100, 500),
                          'Attack': np.random.randint(1, 100, 500),
                          'Defense': np.random.randint(1, 100, 500),
                          'Sp. Atk': np.random.randint(1, 100, 500),
                          'Sp. Def': np.random.randint(1, 100, 500),
                          'Speed': np.random.randint(1, 100, 500),
                          'Stage': np.random.randint(1, 3, 500),
                          'Legend': np.random.choice([True, False], 500)
                          })

Plots

run_plot(random_df, ['Legend', 'Stage'])

Two Group Plot Output

run_plot(random_df, ['Legend', 'Stage', 'Name'])

Three Group Plot

Upvotes: 4

Diziet Asahi
Diziet Asahi

Reputation: 40747

In seaborn's scatterplot(), you can combine both a hue= and a style= parameter to produce different markers and different colors for each combinations

example (taken verbatim from the documentation):

tips = sns.load_dataset("tips")
ax = sns.scatterplot(x="total_bill", y="tip", data=tips)
ax = sns.scatterplot(x="total_bill", y="tip",
                     hue="day", style="time", data=tips)

enter image description here

Upvotes: 7

Related Questions