Plot distribution data over bar plot

Question

I have a big dictionary containing frequencies, like this:

frequency = {3: 231, 6: 373, 8: 455}

where the dictionary keys represent the lengths of the sentences and the values the number of sentences with that length.

I created the bar plot like this:

fig, ax = plt.subplots()
ax.bar(list(frequency.keys()), frequency.values(), log=True, color='g', width=0.5)
ax.set_title('DISTRIBUTION OF SENTENCE LENGTH')
ax.set_xlabel('Sentence length')
ax.set_ylabel('Frequency')
plt.show()

the result is correct and is the following:

now what I would like to do is to draw the distribution of these values. Something like that:

How can I do? I have already tried to follow this post (and others like this), but with poor results. Thank you!

JohanC · Accepted Answer

In seaborn's histplot there is a weights parameter. It also allows to add a kde. The default bandwidth seems a bit too wide, it can be adjusted via kde_kws={'bw_adjust': 0.3}. With discrete=True, the histogram bins are adapted to the discrete values.

Here is an example:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

frequencies = {1: 2000}
for i in range(2, 10):
    frequencies[i] = int(frequencies[i - 1] * np.random.uniform(1.02, 1.1))
for i in range(10, 500):
    frequencies[i] = int(frequencies[i - 1] * np.random.uniform(0.97, 0.99))
    if frequencies[i] == 0:
        break

ax = sns.histplot(x=frequencies.keys(), weights=frequencies.values(), discrete=True,
                  kde=True, kde_kws={'bw_adjust': 0.2}, line_kws={'linewidth': 3})
ax.margins(x=0.01)
plt.show()

Plot distribution data over bar plot

Answers (1)

Related Questions