develarist
develarist

Reputation: 1365

How to extract density function probabilities in python (pandas kde)

The pandas.plot.kde() function is handy for plotting the estimated density function of a continuous random variable. It will take data x as input, and display the probabilities p(x) of the binned input as its output.

How can I extract the values of probabilities it computes? Instead of just plotting the probabilities of bandwidthed samples, I would like an array or pandas series that contains the probability values it internally computed.

If this can't be done with pandas kde, let me know of any equivalent in scipy or other

Upvotes: 14

Views: 12762

Answers (1)

My Work
My Work

Reputation: 2508

there are several ways to do that. You can either compute it yourself or get it from the plot.

  1. As pointed out in the comment by @RichieV following this post, you can extract the data from the plot using
data.plot.kde().get_lines()[0].get_xydata()
  1. Use seaborn and then the same as in 1):

You can use seaborn to estimate the kernel density and then matplotlib to extract the values (as in this post). You can either use distplot or kdeplot:

import seaborn as sns

# kde plot
x,y = sns.kdeplot(data).get_lines()[0].get_data()
# distplot
x,y = sns.distplot(data, hist=False).get_lines()[0].get_data()

  1. You can use the underlying methods of scipy.stats.gaussian_kde to estimate the kernel density which is used by pandas:
import scipy.stats

density = scipy.stats.gaussian_kde(data)

and then you can use this to evaluate it on a set of points:

x = np.linspace(0,80,200)
y = density(xs)

Upvotes: 19

Related Questions