Amani
Amani

Reputation: 18123

Pandas' cut does not work as expected

I do the following cut on a data frame:

df['Age_Groups'] = pd.cut(df.Age, [0, 60, 120, 240, 360, 480, 600, 720, 940], 
                                labels=['0-5', '5-10', '11-20', '21-30', '31-40', '41-50', '51-60', '> 60'])

Does this mean that values 0 to 60 are included in '0-5'? Is 60 excluded, or is zero excluded in 0-5, for example?

Upvotes: 0

Views: 3394

Answers (1)

B. M.
B. M.

Reputation: 18628

You must accord your bins to the labels :

df['Age_Groups'] = pd.cut(df.Age, [0,6,10], labels=['0-5', '6-10'],right=False)

"""                                
    Age Age_Groups
0     0        0-5
1     1        0-5
2     2        0-5
3     3        0-5
4     4        0-5
5     5        0-5
6     6       6-10
7     7       6-10
8     8       6-10
9     9       6-10
10   10        NaN
"""

From the docs, left bounds are by default excluded, right included :

right : bool, optional Indicates whether the bins include the rightmost edge or not. If right == True (the default), then the bins [1,2,3,4] indicate (1,2], (2,3], (3,4].

Here (right = False) 0,6 is on the contrary [,6).

Upvotes: 1

Related Questions