George Zorikov
George Zorikov

Reputation: 139

Cumulative distribution function via plt.hist()

I have data and I want to plot empirical cumulative distribution function. I took a piece of code from matplotlib official site. They use histogram to plot step function.

data = np.array([5, 8, 5, 9, 10, 15, 7, 12, 19, 21, 7, 10, 11,
    13, 18, 20, 20, 14, 15, 15, 21, 3, 8, 13, 14, 14, 15,
    14, 17, 24, 22, 28, 24, 22, 25, 16, 21, 24, 18, 20])

hist_cum, bin_edges, patches = plt.hist(data, bins='sturges', density=True,
                                         histtype='step',cumulative=True)

Output: histogram

The problem is: there is one '28' in the data. Formula says F(x) = P{X < x}. Strict inequality. That means it can't be 1 on the left of x=28.

I cannot understand how to fix it.

Upvotes: 0

Views: 767

Answers (1)

AirSquid
AirSquid

Reputation: 11938

A couple things. First, I think your understanding of CDF is shaky. The CDF plot is 1.0 for all X > max(your data) for an empirical distribution. Right? The probability that a random sample from this distribution, X is less than the particular point, x, wayyyy off to the high side of the plot is 1.0.

That said, I think what you are looking to do is control the axis limits of your plot. Try tinkering with these commands before rendering the plot:

plt.xlim(0, 28)
plt.xticks(np.arange(0,30,2))

Upvotes: 1

Related Questions