Reputation: 1612
In the Examples
section for pandas.cut
, the following is mentioned:
Discretize into three equal-sized bins.
pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3)
[(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], ...
Categories (3, interval[float64, right]): [(0.994, 3.0] < (3.0, 5.0] ...
How are the bins equal sized? The first bin seems larger...
Upvotes: 0
Views: 133
Reputation: 153460
Note per docs:
int : Defines the number of equal-width bins in the range of x. The range of x is extended by .1% on each side to include the minimum and maximum values of x.
So, it is extending the lower bin to capture the lowest value, 1.
Now watch if we close on the other size with right=False
pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, right=False)
Output:
[[1.0, 3.0), [5.0, 7.006), [5.0, 7.006), [3.0, 5.0), [5.0, 7.006), [3.0, 5.0)]
Categories (3, interval[float64, left]): [[1.0, 3.0) < [3.0, 5.0) < [5.0, 7.006)]
The top value is extended to capture 7.
Upvotes: 1