Reputation: 604
This question builds on another question that I recently asked. I have the following sample data set:
Year A
1999 1
1999 3
1999 10
1999 11
2000 43
2000 39
2000 22
2000 21
2001 NaN
2001 NaN
2001 NaN
2001 NaN
I run the following code and get the ValueError: Bin edges must be unique
:
dataset['Bins'] = dataset.groupby('Year')['A'].transform(lambda x: pd.cut(x, 2))
print (dataset)
Since this is grouped by year and 2001
has all missing values I understand why the error was raised. Some of the proposed solutions were to drop all of the duplicates but I want to keep all of the NaN's.
Ideally I would have a data set that looked like this:
Year A Bins
1999 1 (0.99, 6.0]
1999 3 (0.99, 6.0]
1999 10 (6.0, 11.0]
1999 11 (6.0, 11.0]
2000 43 (32.0, 43.0]
2000 39 (32.0, 43.0]
2000 22 (20.978, 32.0]
2000 21 (20.978, 32.0]
2001 NaN NaN
2001 NaN NaN
2001 NaN NaN
2001 NaN NaN
I'm not very familiar with lambda
functions but is it possible to create a conditional statement within it to capture the cases where all values in a year are missing?
Upvotes: 1
Views: 1886
Reputation: 153510
I think your problem are groups with all NaNs. Let's avoid this problem with the following:
df.groupby('Year')['A'].transform(lambda x: np.nan if x.isnull().all() else pd.cut(x, 2))
Output:
0 (0.99, 6.0]
1 (0.99, 6.0]
2 (6.0, 11.0]
3 (6.0, 11.0]
4 (32.0, 43.0]
5 (32.0, 43.0]
6 (20.978, 32.0]
7 (20.978, 32.0]
8 NaN
9 NaN
10 NaN
11 NaN
Name: A, dtype: object
Upvotes: 1