otteheng
otteheng

Reputation: 604

ValueError: Bin edges must be unique:

This question builds on another question that I recently asked. I have the following sample data set:

Year    A
1999    1
1999    3
1999    10
1999    11
2000    43
2000    39
2000    22
2000    21
2001    NaN
2001    NaN
2001    NaN
2001    NaN

I run the following code and get the ValueError: Bin edges must be unique:

dataset['Bins'] = dataset.groupby('Year')['A'].transform(lambda x: pd.cut(x, 2))
print (dataset)

Since this is grouped by year and 2001 has all missing values I understand why the error was raised. Some of the proposed solutions were to drop all of the duplicates but I want to keep all of the NaN's.

Ideally I would have a data set that looked like this:

Year    A   Bins
1999    1   (0.99, 6.0]
1999    3   (0.99, 6.0]
1999    10  (6.0, 11.0]
1999    11  (6.0, 11.0]
2000    43  (32.0, 43.0]
2000    39  (32.0, 43.0]
2000    22  (20.978, 32.0]
2000    21  (20.978, 32.0]
2001    NaN NaN
2001    NaN NaN
2001    NaN NaN
2001    NaN NaN

I'm not very familiar with lambda functions but is it possible to create a conditional statement within it to capture the cases where all values in a year are missing?

Upvotes: 1

Views: 1886

Answers (1)

Scott Boston
Scott Boston

Reputation: 153510

I think your problem are groups with all NaNs. Let's avoid this problem with the following:

df.groupby('Year')['A'].transform(lambda x: np.nan if x.isnull().all() else pd.cut(x, 2))

Output:

0        (0.99, 6.0]
1        (0.99, 6.0]
2        (6.0, 11.0]
3        (6.0, 11.0]
4       (32.0, 43.0]
5       (32.0, 43.0]
6     (20.978, 32.0]
7     (20.978, 32.0]
8                NaN
9                NaN
10               NaN
11               NaN
Name: A, dtype: object

Upvotes: 1

Related Questions