Reputation: 923
I'm trying to get the Seaborn kdeplot example to work on my dataset. For some reason, one of my datasets isn't plotting at all, but the other seems to be plotting fine. To get a minimal working example, I have sampled only 10 rows from my very large data sets.
My input data looks like this:
#Dataframe dfA
index x y category
0 595700 5 1.000000 14.0
1 293559 4 1.000000 14.0
2 562295 3 0.000000 14.0
3 219426 4 1.000000 14.0
4 592731 2 1.000000 14.0
5 178573 3 1.000000 14.0
6 553156 4 0.500000 14.0
7 385031 1 1.000000 14.0
8 391681 3 0.999998 14.0
9 492771 2 1.000000 14.0
# Dataframe dfB
index x y category
0 56345 3 1.000000 6.0
1 383741 4 1.000000 6.0
2 103044 2 1.000000 6.0
3 297357 5 1.000000 6.0
4 257508 3 1.000000 6.0
5 223600 2 0.999938 6.0
6 44530 2 1.000000 6.0
7 82925 3 1.000000 6.0
8 169592 3 0.500000 6.0
9 229482 4 0.285714 6.0
My code snippet looks like this:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="darkgrid")
# Set up the figure
f, ax = plt.subplots(figsize=(8, 8))
# Draw the two density plots
ax = sns.kdeplot(dfA.x, dfA.y,
cmap="Reds", shade=True, shade_lowest=False)
ax = sns.kdeplot(dfB.x, dfB.y,
cmap="Blues", shade=True, shade_lowest=False)
Why isn't the data from dataframe dfA
actually plotting?
Upvotes: 2
Views: 3533
Reputation: 49002
I don't think gaussian KDE is a good fit for either of your datasets. You have one variable with discrete values and one variable where the large majority of values seem to be a constant. This is not well modeled by a bivariate gaussian distribution.
As for what exactly is happening, without the full dataset I cannot say for sure, but I expect that the KDE bandwidth (particularly on the y axis) is ending up very very narrow such that regions with non-negligible density are tiny. You could try setting a wider bandwidth, but my advice would be to use a different kind of plot for this data.
Upvotes: 3