maciek
maciek

Reputation: 2107

python pandas dataframe KDEplot density levels coloring issue

I Have a pandas df and I want to create a KDE plot. At first when I call

g = sns.jointplot("s_zscore","p_zscore",s=2, data=scatter_all, kind="scatter")

enter image description here

That is OK and as we see the data is scatterted to 4 2D clusters. Regular KDE plot confirms this:

g = sns.jointplot("s_zscore","p_zscore",s=2, data=scatter_all, kind="kde")

enter image description here

But now when I provide my own density levels weird things happen - for some reason in the subspace with the highest density I get a white circle:

g = sns.jointplot("s_zscore","p_zscore",s=2, data=scatter_all, kind="kde",levels=density_levels)

enter image description here

This is clearly wrong. Does anyone know what the hell is happening here?

=======================================
EDIT:
Ok I figured it has something to do with the levels I provide:

g = sns.jointplot("s_zscore","p_zscore",s=2, data=scatter_all, kind="kde",levels=density_levels+[0.03])

enter image description here

g = sns.jointplot("s_zscore","p_zscore",s=2, data=scatter_all, kind="kde",levels=density_levels+[0.09])

enter image description here

So now - How should I choose the maximal value for the largest isoline? The density_levels contain percentiles At which I want to make a boundary.

Upvotes: 0

Views: 751

Answers (1)

maciek
maciek

Reputation: 2107

OK I found it out, posting an answer for future generations:
It seems that seaborn's kdeplot() utilizes matplotlib's contourf(), and as one can see in the documentation it fills the areas between specified ranges [min,max], so I was missing an upper bound on my densities, to begin with.
Secondly, the colors are adjusted depending on the upper bound one provides. That is because kdeplot() takes a color map by default and stretches your [min,max] range accordingly onto the color space. If the maximal value is far away from the rest of the isolines one gets an intense center with very vague areas around.
The solution for this is to provide colors manually with the "colors" parameter and turn off the color map:

fifty_shades_of_grey = ["#f3f3f3","#e6e6e6","#d9d9d9","#cccccc","#bfbfbf"]
sns.palplot(sns.color_palette(fifty_shades_of_grey))

enter image description here

g = sns.jointplot("s_zscore","p_zscore", data=scatter_all, kind="kde",levels=density_levels+[1],colors=fifty_shades_of_grey,cmap=None)

enter image description here

Case closed, Watson.

Upvotes: 2

Related Questions