Reputation: 45
I am trying to plot a CDF over a histogram using matplotlib with the following code:
values, base = np.histogram(df['0'], bins=50)
cumulative = np.cumsum(values) / df['0'].sum()
# plot the cumulative function
plt.hist(df['0'], bins=50, density=True)
plt.plot(base[:-1], cumulative, c='blue')
plt.show()
However my plot ends up looking like this, where the CDF looks like it is nearing .007 or thereabouts, when I would expect it to reach 1:
I'm not sure what I'm doing wrong, but I'd appreciate any help
Upvotes: 1
Views: 472
Reputation: 3077
I think the problem is that you are normalizing the cumulative sum of the bins with the sum of the values in your dataframe. The quantity stored in values
is the number of occurrence of df['0']
inside the corresponding bin.
If you want to show the cumulative sum of the bins you need to normalize it to the total number of elements of df['0']
:
cumulative = np.cumsum(values)/df['0'].values.shape[0]
Upvotes: 3