Reputation: 183
I have a question about probability distribution function I have a time series data and I want to calculate the probability distribution of data in different time windows.
I have developed the following code but i could not find the value of probability distribution for this function.
a = pd.DataFrame([0.0,
21.660332407421638,
20.56428943581567,
20.597329924045983,
19.313207915827956,
19.104973174542806,
18.031361568112377,
17.904747973652125,
16.705687654209264,
16.534206966165637,
16.347782724271802,
13.994284547628721,
12.870120434556945,
12.794530081249571,
10.660675400742669])
this is the histogram and density plot of my data:
a.plot.hist()
a.plot.density()
but i don't know how can I calculate the value of the area under density curve.
Upvotes: 7
Views: 5205
Reputation: 1460
You can directly call the method scipy.stats.gaussian_kde
which is also used by pandas internally.
This method returns the desired function.
You can then call one of the methods from scipy.integrate
to calculate areas under the kernel density estimate, e.g.
from scipy import stats, integrate
kde = stats.gaussian_kde(a[0])
# Calculate the integral of the kde between 10 and 20:
xmin, xmax = 10, 20
integral, err = integrate.quad(kde, xmin, xmax)
x = np.linspace(-5,20,100)
x_integral = np.linspace(xmin, xmax, 100)
plt.plot(x, kde(x), label="KDE")
plt.fill_between(x_integral, 0, kde(x_integral),
alpha=0.3, color='b', label="Area: {:.3f}".format(integral))
plt.legend()
Upvotes: 8