Jane Wayne
Jane Wayne

Reputation: 8855

How do I estimate the 80% cumulative distribution point in scipy, numpy and/or Python?

Seaborn has a kdeplot function where if you pass in cumulative=True, then a cumulative distribution of the data is drawn. I need to annotate or figure out the value on the x-axis at which the cumulative distribution is 80% and then draw a vertical line from that value.

Is there a method in numpy, scipy or elsewhere in Python that may compute that value?

Upvotes: 0

Views: 220

Answers (1)

koko
koko

Reputation: 198

If you already have the cdf, then you can do the following. I'm not sure how your data is formatted, but assuming you have two arrays, one of x-values and one of y-values, you can search for the index of the y-value just above 0.8. The corresponding x-value would be what you're looking for. A quick way to do this, since your y-values should already be sorted, is:

import bisect
index = bisect.bisect_right(y_vals, 0.8) - 1

This is a nearest neighbor approach. If you want a slightly more accurate x-value, you can linearly interpolate between index and index-1.

Upvotes: 1

Related Questions