An Ignorant Wanderer
An Ignorant Wanderer

Reputation: 1612

Why does Pandas cut return unequal sized bins?

In the Examples section for pandas.cut, the following is mentioned:

Discretize into three equal-sized bins.

pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3)

[(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], ...
Categories (3, interval[float64, right]): [(0.994, 3.0] < (3.0, 5.0] ...

How are the bins equal sized? The first bin seems larger...

Upvotes: 0

Views: 133

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Note per docs:

int : Defines the number of equal-width bins in the range of x. The range of x is extended by .1% on each side to include the minimum and maximum values of x.

So, it is extending the lower bin to capture the lowest value, 1.

Now watch if we close on the other size with right=False

pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, right=False)

Output:

[[1.0, 3.0), [5.0, 7.006), [5.0, 7.006), [3.0, 5.0), [5.0, 7.006), [3.0, 5.0)]
Categories (3, interval[float64, left]): [[1.0, 3.0) < [3.0, 5.0) < [5.0, 7.006)]

The top value is extended to capture 7.

Upvotes: 1

Related Questions