Reputation: 131038
I have a pandas data frame and would like to calculate density distribution function for these values. Would be nice to have something like that:
df['col_name'].dens()
However, if something like that does not exist, I can put all these value to a list and then use some other functions that calculate a density distribution function for values in a list. It would be great if I can do it in either of these packages: scipy
, numpy
, ipython
, scikit
.
Upvotes: 1
Views: 2375
Reputation: 27549
You can use scipy.stats.gaussian_kde
and just pass it the dataframe column:
df = pd.DataFrame(data={'a':np.random.randn(100)}) # 100 normally distributed values
g = sp.stats.gaussian_kde(df.a)
[g(x)[0] for x in np.linspace(-3,3,10)]
gives:
[0.010404194709511637,
0.028412197910606129,
0.093548960033717946,
0.1915548075057672,
0.29626128014747688,
0.3402226687259407,
0.29679380013692241,
0.15516355334523385,
0.057147975947743457,
0.020153062250794138]
Upvotes: 3
Reputation: 4894
If all you want is a density plot: df['col_name'].plot(kind='density')
Upvotes: 1