Arij SEDIRI
Arij SEDIRI

Reputation: 2158

Qcut Pandas : ValueError: Bin edges must be unique

I'm using Qcut from Pandas in order to discretize my Data into equal-sized buckets. I want to have price buckets. This is my DataFrame :

        productId   sell_prix   categ   popularity
11997   16758760.0  28.75        50      524137.0
11998   16758760.0  28.75        50      166795.0
13154   16782105.0  24.60        50      126890.5
13761   16790082.0  65.00        50      245437.0
13762   16790082.0  65.00        50      245242.0
15355   16792720.0  29.00        50      360219.0
15356   16792720.0  29.00        50      360100.0
15357   16792720.0  29.00        50      360027.0
15358   16792720.0  29.00        50      462850.0
15367   16792728.0  29.00        50      193030.5

And this is my code :

df['PriceBucket'] = pd.qcut(df['sell_prix'], 3)

I have this error message :

**ValueError: Bin edges must be unique: array([ 24.6,  29. ,  29. ,  65. ])**

In reality, I have a DataFrame with 7413 rows. So this is just a sampling of the real DataFrame. The strange thing is that when I use the same code with a DataFrame with 359824 rows, with practically the same Data, it works ! Is there any dependence with the length of DataFrame ?

Help please ! Many thanks.

Upvotes: 6

Views: 11496

Answers (1)

luca
luca

Reputation: 7546

Various solutions are discussed here, but briefly:

> pd.qcut(df['a'].rank(method='first'), 3)
0        [1, 2.333]
1        [1, 2.333]
2    (2.333, 3.667]
3        (3.667, 5]
4        (3.667, 5]

Or

> pd.qcut(df['a'].rank(method='first'), 3, labels=False)
0    0
1    0
2    1
3    2
4    2

Upvotes: 6

Related Questions