Farhan Panja
Farhan Panja

Reputation: 53

Data Lost while creating bins using python pandas cut function

My objective is to transfer one column from df1 to df2 and at the same time creating bins.I have dataframe named df1 which include 3 numerical variables. I want to fetch one variable named 'tenure' to df2 and want to create bins.It transfer column values to df2 but df2 shows some missing values. please find code below :

df2=pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high'])

before creating df2 I checked for missing values in df1. there was not such mussing values but after creating bins it shows 11 missing values.

print(df2.isnull().sum())

Above Code shows 11 missing values

Anyones help is appreciated.

Upvotes: 1

Views: 1397

Answers (1)

Ben.T
Ben.T

Reputation: 29635

I assume you have some values in df1['tenure'] that are not in (0,80], maybe the zeros. See the example below:

df1 = pd.DataFrame({'tenure':[-1, 0, 12, 34, 78, 80, 85]})
print (pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high']))

0       NaN    # -1 is lower than 0 so result is null
1       NaN    # it was 0 but the segment is open on the lowest bound so 0 gives null
2       low
3    medium
4      high
5      high    # 80 is kept as the segment is closed on the right
6       NaN    # 85 is higher than 80 so result is null
Name: tenure, dtype: category
Categories (3, object): [low < medium < high]

Now, you can pass the parameter include_lowest=True in pd.cut to keep the left bound in the result:

print (pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high'],
              include_lowest=True))

0       NaN
1       low  # now where the value was 0 you get low and not null
2       low
3    medium
4      high
5      high
6       NaN
Name: tenure, dtype: category
Categories (3, object): [low < medium < high]

So finally, I think that if you print len(df1[(df1.tenure <= 0) | (df1.tenure > 80)]) you will get 11 with your data as the number of null values in your df2 (here it is 3 with my data)

Upvotes: 1

Related Questions