MSN
MSN

Reputation: 183

Calculating probability distribution from time series data in python

I have a question about probability distribution function I have a time series data and I want to calculate the probability distribution of data in different time windows.

I have developed the following code but i could not find the value of probability distribution for this function.

a = pd.DataFrame([0.0,
21.660332407421638,
20.56428943581567,
20.597329924045983,
19.313207915827956,
19.104973174542806,
18.031361568112377,
17.904747973652125,
16.705687654209264,
16.534206966165637,
16.347782724271802,
13.994284547628721,
12.870120434556945,
12.794530081249571,
10.660675400742669])

this is the histogram and density plot of my data:

a.plot.hist()
a.plot.density()

but i don't know how can I calculate the value of the area under density curve.

Upvotes: 7

Views: 5205

Answers (1)

jdamp
jdamp

Reputation: 1460

You can directly call the method scipy.stats.gaussian_kde which is also used by pandas internally. This method returns the desired function. You can then call one of the methods from scipy.integrate to calculate areas under the kernel density estimate, e.g.

from scipy import stats, integrate

kde = stats.gaussian_kde(a[0])

# Calculate the integral of the kde between 10 and 20:
xmin, xmax = 10, 20
integral, err = integrate.quad(kde, xmin, xmax)

x = np.linspace(-5,20,100)
x_integral = np.linspace(xmin, xmax, 100)

plt.plot(x, kde(x), label="KDE")
plt.fill_between(x_integral, 0, kde(x_integral),
                 alpha=0.3, color='b', label="Area: {:.3f}".format(integral))
plt.legend()

enter image description here

Upvotes: 8

Related Questions