Reputation: 1622
I have a big dictionary containing frequencies, like this:
frequency = {3: 231, 6: 373, 8: 455}
where the dictionary keys represent the lengths of the sentences and the values the number of sentences with that length.
I created the bar plot like this:
fig, ax = plt.subplots()
ax.bar(list(frequency.keys()), frequency.values(), log=True, color='g', width=0.5)
ax.set_title('DISTRIBUTION OF SENTENCE LENGTH')
ax.set_xlabel('Sentence length')
ax.set_ylabel('Frequency')
plt.show()
the result is correct and is the following:
now what I would like to do is to draw the distribution of these values. Something like that:
How can I do? I have already tried to follow this post (and others like this), but with poor results. Thank you!
Upvotes: 1
Views: 2081
Reputation: 80279
In seaborn's histplot
there is a weights
parameter. It also allows to add a kde
. The default bandwidth seems a bit too wide, it can be adjusted via kde_kws={'bw_adjust': 0.3}
. With discrete=True
, the histogram bins are adapted to the discrete values.
Here is an example:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
frequencies = {1: 2000}
for i in range(2, 10):
frequencies[i] = int(frequencies[i - 1] * np.random.uniform(1.02, 1.1))
for i in range(10, 500):
frequencies[i] = int(frequencies[i - 1] * np.random.uniform(0.97, 0.99))
if frequencies[i] == 0:
break
ax = sns.histplot(x=frequencies.keys(), weights=frequencies.values(), discrete=True,
kde=True, kde_kws={'bw_adjust': 0.2}, line_kws={'linewidth': 3})
ax.margins(x=0.01)
plt.show()
Upvotes: 2