datadumn
datadumn

Reputation: 45

Cumulative distribution function in numpy not reaching 1?

I am trying to plot a CDF over a histogram using matplotlib with the following code:

values, base = np.histogram(df['0'], bins=50)
cumulative = np.cumsum(values) / df['0'].sum()
# plot the cumulative function
plt.hist(df['0'], bins=50, density=True)
plt.plot(base[:-1], cumulative, c='blue')
plt.show()

However my plot ends up looking like this, where the CDF looks like it is nearing .007 or thereabouts, when I would expect it to reach 1: The plot I got

I'm not sure what I'm doing wrong, but I'd appreciate any help

Upvotes: 1

Views: 472

Answers (1)

Andrea
Andrea

Reputation: 3077

I think the problem is that you are normalizing the cumulative sum of the bins with the sum of the values in your dataframe. The quantity stored in values is the number of occurrence of df['0'] inside the corresponding bin.

If you want to show the cumulative sum of the bins you need to normalize it to the total number of elements of df['0']:

cumulative = np.cumsum(values)/df['0'].values.shape[0]

Upvotes: 3

Related Questions