noob
noob

Reputation: 6854

Seaborn pairplots with continuous hues?

How may I introduce a continuous hue to my seaborn pairplots?

I am passing in a pandas data frame train_df in order to visualise the relationship between the multiple features.

However I'd also like to add a hue which would use their corresponding target values, target_df. These target values are on a continuous scale (~ floats between 10 and 100).

I have defined a sns.color_palette("RdGr") that I'd like to use.

Right now I have following pairplot (with no hue):

sns.pairplot(train_df)

How can I pass in the target_df as a hue using color palette defined above?

Many thanks in advance.

Upvotes: 9

Views: 4651

Answers (4)

David
David

Reputation: 1

I have too low of reputation to comment, so I have to provide a separate answer. I modified the selected answer (by Quang Hoang) in two regards.

First, if your columns have names other than the integers corresponding to the subplots grid, then you'll need to distinguish between the indices (here, i and j) of the columns, which are passed to train_df[i], and those of the subplots, which are passed to ax=ax[i,j].

Second, seaborn's distplot is deprecated, so switch that to histplot.

num_feats = len(train_df.columns)
fig, ax = plt.subplots(num_feats, num_feats, figsize=(10,10))

# In order to assign the plots to the correct axes,
# create a dictionary to refer the column names to integers.
col_dict = dict(zip(train_df.columns, range(len(train_df.columns))))

for col_i in train_df.columns:
    i = col_dict[col_i]
    for col_j in train_df.columns:
        j = col_dict[col_j]
        if col_i==col_j:
            sns.histplot(train_df[col_i], kde=False, ax=ax[i,j])
        else:
            sns.scatterplot(x=train_df[col_i], y=train_df[col_j], ax=ax[i,j], hue=target_df, legend=False)

fig.tight_layout()  # make room for subplot labels

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150735

You can just assign the target_df as a column in train_df and pass it as hue:

sns.pairplot(data=train_df.assign(target=target_df, 
                                  hue='target')

However, this will be extremely slow if your target is continuous. Instead, you can do a double for loop:

num_features = len(train_df.columns)
fig,ax = plt.subplots(num_features, num_features, figsize=(10,10))
for i in train_df.columns:
    for j in train_df.columns:
        if i==j:    # diagonal
            sns.distplot(train_df[0], kde=False, ax=ax[i][j])
        else:       # off diagonal
            sns.scatterplot(x=train_df[i],y=train_df[j], 
                            ax=ax[i][j], hue=target_df, palette='BrBG',
                            legend=False)

Which gives you something like this:

enter image description here

Upvotes: 4

mwaskom
mwaskom

Reputation: 48992

pairplot (well, the underlying PairGrid) now delegates hue to the mapped function if it supports it. You'll probably not want to draw a distinct marginal density for each price, though:

diamonds = sns.load_dataset("diamonds")
sns.pairplot(
    diamonds,
    hue="price", vars=["carat", "depth", "table"], 
    diag_kws=dict(color=".2", hue=None)
)

Old answer, useful for demonstrating more flexible use of PairGrid:

This could be easier than it currently is, but it's not necessary to recreate PairGrid yourself.

diamonds = sns.load_dataset("diamonds")
g = sns.PairGrid(diamonds, vars=["carat", "depth", "table"])
g.map_diag(sns.kdeplot, color=".2")
g.map_offdiag(sns.scatterplot, hue=diamonds["price"], s=5, linewidth=0)
g.axes[1, -1].legend(loc="center left", bbox_to_anchor=(.7, .5))

enter image description here

Upvotes: 7

kynnemall
kynnemall

Reputation: 888

In the documentation ( https://seaborn.pydata.org/generated/seaborn.pairplot.html ), you can see that you just need to set palette = my_palette when calling the function like so:

sns.pairplot(train_df, palette = my_palette)

Upvotes: -2

Related Questions