licm
licm

Reputation: 99

density distribution and bar plot from x and y data

I have the following data set in a pandas dataframe:

x = df_data.iloc[:,0].values
y = df_data.iloc[:,1].values

The following data is in x and y, respectively:

x = 30, 31, 32, 33, 34, 35, 36
y = 1000, 2000, 3000, 4000, 3000, 2000, 1000

y represents the counts (how often each x value exists).

I now want to make a bar plot with the density distribution line. I'm open to using seaborn or matplotlib, but couldn't find a way to enter x and y data separately and to obtain the bar plot plus the density plot.

I've tried this:

x = [30,31,32,33,34,35,36]
y = [1000, 2000, 3000, 4000, 3000, 2000, 1000]
##
sns.distplot(x, hist=True, kde=True,
    bins=int(150/150), color='darkblue',
    hist_kws={'edgecolor':'black'},
    kde_kws={'linewidth': 4})
plt.show()

but didn't get what I wanted.

I would like to have something like below (just for my data)

enter image description here

(i got this image from: https://towardsdatascience.com/histograms-and-density-plots-in-python-f6bda88f5ac0)

Upvotes: 1

Views: 4559

Answers (1)

JohanC
JohanC

Reputation: 80279

First off, note that distplot has been depreciated in Seaborn 0.11. The extended and improved versions are now called histplot (histogram with optional kde ), kdeplot (for just a kde) and displot (creates subplots).

The optional weights= parameter sets weights for each of the x values. discrete=True is needed to have a bar for each x value. The cut parameter of the kde controls how far the curve is drawn outside the data points.

import matplotlib.pyplot as plt
import seaborn as sns

x = [30, 31, 32, 33, 34, 35, 36]
y = [1000, 2000, 3000, 4000, 3000, 2000, 1000]

sns.histplot(x=x, weights=y, discrete=True,
             color='darkblue', edgecolor='black',
             kde=True, kde_kws={'cut': 2}, line_kws={'linewidth': 4})
plt.show()

histplot with weights

Note that in case the underlying data is continuous, you'd get a much correcter plot by providing the original data.

To change the color of the kde line, an obvious idea would be to use line_kws={'color': 'red'}, but this doesn't work in the current seaborn version (0.11.1).

However, you can draw a histplot and kdeplot separately. In order to have matching y-axes, the histplot needs stat='density' (the default is 'count').

ax = sns.histplot(x=x, weights=y, discrete=True, alpha=0.5,
                  color='darkblue', edgecolor='black', stat='density')
sns.kdeplot(x=x, weights=y, color='crimson', cut=2, linewidth=4, ax=ax)

Another approach is to change the color of the line afterwards, which works independently of the chosen stat=.

ax = sns.histplot(x=x, weights=y, discrete=True,
             color='darkblue', edgecolor='black',
             kde=True, kde_kws={'cut': 2}, line_kws={'linewidth': 4})
ax.lines[0].set_color('crimson')

sns.histplot with changed line color

Here is an example how a histogram for one dataset can be combined with a kde curve of another dataset:

import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
import seaborn as sns

x = [30, 31, 32, 33, 34, 35, 36]
y = [1000, 2000, 3000, 4000, 3000, 2000, 1000]
x2 = [20, 21, 22, 23, 24, 25, 26]
y2 = [1000, 2000, 3000, 4000, 3000, 2000, 1000]

ax = sns.histplot(x=x2, weights=y2, discrete=True, alpha=0.5,
                  color='darkblue', edgecolor='black', stat='density')
sns.kdeplot(x=x, weights=y, color='crimson', cut=2, linewidth=4, ax=ax)
ax.xaxis.set_major_locator(MultipleLocator(1))
plt.show()

combining a histplot with a kdeplot of a different dataset

Upvotes: 3

Related Questions