Roman
Roman

Reputation: 131038

How to calculate a densitiy distribution for a set of values in python?

I have a pandas data frame and would like to calculate density distribution function for these values. Would be nice to have something like that:

df['col_name'].dens()

However, if something like that does not exist, I can put all these value to a list and then use some other functions that calculate a density distribution function for values in a list. It would be great if I can do it in either of these packages: scipy, numpy, ipython, scikit.

Upvotes: 1

Views: 2375

Answers (2)

Daniel
Daniel

Reputation: 27549

You can use scipy.stats.gaussian_kde and just pass it the dataframe column:

df = pd.DataFrame(data={'a':np.random.randn(100)}) # 100 normally distributed values
g = sp.stats.gaussian_kde(df.a)
[g(x)[0] for x in np.linspace(-3,3,10)]

gives:

[0.010404194709511637,
 0.028412197910606129,
 0.093548960033717946,
 0.1915548075057672,
 0.29626128014747688,
 0.3402226687259407,
 0.29679380013692241,
 0.15516355334523385,
 0.057147975947743457,
 0.020153062250794138]

Upvotes: 3

herrfz
herrfz

Reputation: 4894

If all you want is a density plot: df['col_name'].plot(kind='density')

Upvotes: 1

Related Questions