user313967
user313967

Reputation: 2063

Is this an error in the seaborn.lineplot hue parameter?

With this code snippet, I'm expecting a line plot with one line per hue, which has these distinct values: [1, 5, 10, 20, 40].

import math
import pandas as pd
import seaborn as sns

sns.set(style="whitegrid")

TANH_SCALING = [1, 5, 10, 20, 40]
X_VALUES = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
COLUMNS = ['x', 'y', 'hue group']

tanh_df = pd.DataFrame(columns=COLUMNS)

for sc in TANH_SCALING:
    data = {
        COLUMNS[0]: X_VALUES,
        COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES],
        COLUMNS[2]: len(X_VALUES)*[sc]}
    tanh_df = tanh_df.append(
        pd.DataFrame(data=data, columns=COLUMNS),
        ignore_index=True
    )

sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df);

However, what I get is a hue legend with values [0, 15, 30, 45], and an additional line, like so:

enter image description here

Is this a bug or am I missing something obvious?

Upvotes: 3

Views: 2598

Answers (2)

mwaskom
mwaskom

Reputation: 49032

As @LudvigH's comment on the other answer says, this isn't a bug, even if the default behavior is surprising in this case. As explained in the docs:

The default treatment of the hue (and to a lesser extent, size) semantic, if present, depends on whether the variable is inferred to represent “numeric” or “categorical” data. In particular, numeric variables are represented with a sequential colormap by default, and the legend entries show regular “ticks” with values that may or may not exist in the data. This behavior can be controlled through various parameters, as described and illustrated below.

Here are two specific ways to control the behavior.

If you want to keep the numeric color mapping but have the legend show the exact values in your data, set legend="full":

sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df, legend="full")

enter image description here

If you want to have seaborn treat the levels of the hue parameter as discrete categorical values, pass a named categorical colormap or either a list or dictionary of the specific colors you want to use:

sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df, palette="deep")

enter image description here

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150815

This is a known bug of seaborn when the hue can be cast to integers. You could add a prefix to the hue so casting to integers fails:

for sc in TANH_SCALING:
    data = {
        COLUMNS[0]: X_VALUES,
        COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES],
        COLUMNS[2]: len(X_VALUES)*[f'A{sc}']}             # changes here
    tanh_df = tanh_df.append(
        pd.DataFrame(data=data, columns=COLUMNS),
        ignore_index=True
    )

Output:

enter image description here

Or after you created your data:

# data creation
for sc in TANH_SCALING:
    data = {
        COLUMNS[0]: X_VALUES,
        COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES],
        COLUMNS[2]: len(X_VALUES)*[f'A{sc}']}
    tanh_df = tanh_df.append(
        pd.DataFrame(data=data, columns=COLUMNS),
        ignore_index=True
    )


# hue manipulation
sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], 
             hue='A_' + tanh_df[COLUMNS[2]].astype(str), # change hue here
             data=tanh_df);

Upvotes: 3

Related Questions