bismo
bismo

Reputation: 1439

How to plot many kdeplots on one figure in python

I have the following data

name        val
G.Kittle    4.0
G.Kittle    10.0
D.Hopkins   3.0
L.Fitzgerald    6.0
... ...
C.Kupp  18.0
R.Woods 21.0
N.Harry 7.0
S.Michel    -6.0

Each name has many values, and I would like to plot a distribution for each name on the same figure. I tried doing this using the hue argument, but that messed everything up and treated all distributions as having an area of 1 together, however, I want each distribution to be independent from each other and have their own area of 1. Does that make sense? I would also like all of them to be gray, which hue doesn't allow naturally.

Edit: Also, when I use hue, I get this error UserWarning: Dataset has 0 variance; skipping density estimate.

Upvotes: 0

Views: 3498

Answers (1)

JohanC
JohanC

Reputation: 80289

sns.kdeplot() has a parameter common_norm= which default to True. In that case, the kde curves will be scaled proportionally to the number of values such that the total area sums to 1. Setting common_norm=False shows all the kde curves such that each individually has an area of one.

Note that there also is a multiple= parameter, defaulting to “layer”, but which also can be set to “stack” or “fill”. In that case the common norm would be appropriate.

The curves can all be colored grey providing a palette as a list of colors with 'grey'. The length of the list should be the same as the number of hue values. As all hue values are the same, a legend would look strange. The legend can be suppressed with legend=False.

When a hue value only appears in one row, the kdeplot with one element isn't drawn, but shows the warning Dataset has 0 variance; skipping density estimate.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

df = pd.DataFrame({'name': np.random.choice([*'ABCD'], 100, p=[0.4, 0.3, 0.2, 0.1]),
                   'val': np.random.rand(100).cumsum()})
df.loc[0, 'name'] = 'E'  # exactly one row with name 'E'
df['name'] = df['name'].astype('category')
sns.kdeplot(data=df, x='val', hue='name', palette=['grey'] * len(df['name'].cat.categories),
            common_norm=False, legend=False)
plt.show()

example plot

Upvotes: 1

Related Questions