Seaborn distplot() won't display frequency in the y-axis

I am trying to display the weighted frequency in the y-axis of a seaborn.distplot() graph, but it keeps displaying the density (which is the default in distplot())

I read the documentation and also many similar questions here in Stack.

The common answer is to set norm_hist=False and also to assign the weights in a bumpy array as in a standard histogram. However, it keeps showing the density and not the probability/frequency of each bin.

My code is

plt.figure(figsize=(10, 4))
plt.xlim(-0.145,0.145)
plt.axvline(0, color='grey')
data = df['col1']

x = np.random.normal(data.mean(), scale=data.std(), size=(100000))
normal_dist =sns.distplot(x, hist=False,color="red",label="Gaussian")

data_viz = sns.distplot(data,color="blue", bins=31,label="data", norm_hist=False)

# I also tried adding the weights inside the argument
#hist_kws={'weights': np.ones(len(data))/len(data)})

plt.legend(bbox_to_anchor=(1, 1), loc=1)

And I keep receiving this output:

enter image description here

Does anyone have an idea of what could be the problem here?

Thanks!

[EDIT]: The problem is that the y-axis is showing the kdevalues and not those from the weighted histogram. If I set kde=False then I can display the frequency in the y-axis. However, I still want to keep the kde, so I am not considering that option.

Upvotes: 0

Views: 2361

Answers (1)

Indrit
Indrit

Reputation: 35

Keeping the kde and the frequency/count in one y-axis in one plot will not work because they have different scales. So it might be better to create a plot with 2 axis with each showing the kde and histogram separately. From documentation norm_hist If True, the histogram height shows a density rather than a count. **This is implied if a KDE or fitted density is plotted**.

versusnja in https://github.com/mwaskom/seaborn/issues/479 has a workaround:

# Plot hist without kde.
# Create another Y axis.
# Plot kde without hist on the second Y axis.
# Remove Y ticks from the second axis.

first_ax  = sns.distplot(data, kde=False)
second_ax = ax.twinx()
sns.distplot(data, ax=second_ax, kde=True, hist=False)
second_ax.set_yticks([])

If you need this just for visualization it should be good enough.

Upvotes: 1

Related Questions