Getting CDF of variable-sized numpy arrays in Python using same bins?

Question

I'd like to make a set of comparable empirical CDFs for a few numpy arrays (each of different length) and store these in a pandas dataframe:

a = scipy.randn(100)
b = scipy.randn(500)
# ECDF from statmodels
cdf_a = ECDF(a)
cdf_b = ECDF(b)

The problem is that cdf_a.x, cdf_a.y will be of different lengths of cdf_b.x, cdf_b.y and I would like these to be the same length, i.e. use same number of bins to compute the CDF so that these can be plotted on same scale from a pandas DataFrame. This is not possible:

df = pandas.DataFrame({"cdf_a": cdf_a.y, "cdf_b": cdf_b.y})

Since the cdfs are not of the same length. How can I bin a and b using similar bins when computing their CDFs, so that I get comparable same-length vectors back?

Is this the best solution?

bins = np.linspace(0, 1, 10)
v1 = cdf_a(bins)
v2 = cdf_b(bins)

user248237 · Accepted Answer

It appears that this is a good solution:

bins = np.linspace(0, 1, 10)
v1 = cdf_a(bins)
v2 = cdf_b(bins)

Then len(v1) == len(v2) and these can be plotted as CDFs of a, b on the same scale.

Getting CDF of variable-sized numpy arrays in Python using same bins?

Answers (2)

Related Questions