Joe
Joe

Reputation: 923

Seaborn kdeplot not plotting some data?

I'm trying to get the Seaborn kdeplot example to work on my dataset. For some reason, one of my datasets isn't plotting at all, but the other seems to be plotting fine. To get a minimal working example, I have sampled only 10 rows from my very large data sets.

My input data looks like this:

#Dataframe dfA
    index   x       y     category
0   595700  5   1.000000    14.0
1   293559  4   1.000000    14.0
2   562295  3   0.000000    14.0
3   219426  4   1.000000    14.0
4   592731  2   1.000000    14.0
5   178573  3   1.000000    14.0
6   553156  4   0.500000    14.0
7   385031  1   1.000000    14.0
8   391681  3   0.999998    14.0
9   492771  2   1.000000    14.0

# Dataframe dfB
    index   x      y      category
0   56345   3   1.000000    6.0
1   383741  4   1.000000    6.0
2   103044  2   1.000000    6.0
3   297357  5   1.000000    6.0
4   257508  3   1.000000    6.0
5   223600  2   0.999938    6.0
6   44530   2   1.000000    6.0
7   82925   3   1.000000    6.0
8   169592  3   0.500000    6.0
9   229482  4   0.285714    6.0

My code snippet looks like this:

import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style="darkgrid")

# Set up the figure
f, ax = plt.subplots(figsize=(8, 8))

# Draw the two density plots
ax = sns.kdeplot(dfA.x, dfA.y,
             cmap="Reds", shade=True, shade_lowest=False)
ax = sns.kdeplot(dfB.x, dfB.y,
             cmap="Blues", shade=True, shade_lowest=False)

Why isn't the data from dataframe dfA actually plotting?

Upvotes: 2

Views: 3533

Answers (1)

mwaskom
mwaskom

Reputation: 49002

I don't think gaussian KDE is a good fit for either of your datasets. You have one variable with discrete values and one variable where the large majority of values seem to be a constant. This is not well modeled by a bivariate gaussian distribution.

As for what exactly is happening, without the full dataset I cannot say for sure, but I expect that the KDE bandwidth (particularly on the y axis) is ending up very very narrow such that regions with non-negligible density are tiny. You could try setting a wider bandwidth, but my advice would be to use a different kind of plot for this data.

Upvotes: 3

Related Questions