Reputation: 139
I have data and I want to plot empirical cumulative distribution function. I took a piece of code from matplotlib official site. They use histogram to plot step function.
data = np.array([5, 8, 5, 9, 10, 15, 7, 12, 19, 21, 7, 10, 11,
13, 18, 20, 20, 14, 15, 15, 21, 3, 8, 13, 14, 14, 15,
14, 17, 24, 22, 28, 24, 22, 25, 16, 21, 24, 18, 20])
hist_cum, bin_edges, patches = plt.hist(data, bins='sturges', density=True,
histtype='step',cumulative=True)
Output: histogram
The problem is: there is one '28' in the data. Formula says F(x) = P{X < x}. Strict inequality. That means it can't be 1 on the left of x=28.
I cannot understand how to fix it.
Upvotes: 0
Views: 767
Reputation: 11938
A couple things. First, I think your understanding of CDF is shaky. The CDF plot is 1.0 for all X > max(your data) for an empirical distribution. Right? The probability that a random sample from this distribution, X is less than the particular point, x, wayyyy off to the high side of the plot is 1.0.
That said, I think what you are looking to do is control the axis limits of your plot. Try tinkering with these commands before rendering the plot:
plt.xlim(0, 28)
plt.xticks(np.arange(0,30,2))
Upvotes: 1