Reputation: 53
My objective is to transfer one column from df1 to df2 and at the same time creating bins.I have dataframe named df1 which include 3 numerical variables. I want to fetch one variable named 'tenure' to df2 and want to create bins.It transfer column values to df2 but df2 shows some missing values. please find code below :
df2=pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high'])
before creating df2 I checked for missing values in df1. there was not such mussing values but after creating bins it shows 11 missing values.
print(df2.isnull().sum())
Above Code shows 11 missing values
Anyones help is appreciated.
Upvotes: 1
Views: 1397
Reputation: 29635
I assume you have some values in df1['tenure']
that are not in (0,80]
, maybe the zeros. See the example below:
df1 = pd.DataFrame({'tenure':[-1, 0, 12, 34, 78, 80, 85]})
print (pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high']))
0 NaN # -1 is lower than 0 so result is null
1 NaN # it was 0 but the segment is open on the lowest bound so 0 gives null
2 low
3 medium
4 high
5 high # 80 is kept as the segment is closed on the right
6 NaN # 85 is higher than 80 so result is null
Name: tenure, dtype: category
Categories (3, object): [low < medium < high]
Now, you can pass the parameter include_lowest=True
in pd.cut
to keep the left bound in the result:
print (pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high'],
include_lowest=True))
0 NaN
1 low # now where the value was 0 you get low and not null
2 low
3 medium
4 high
5 high
6 NaN
Name: tenure, dtype: category
Categories (3, object): [low < medium < high]
So finally, I think that if you print len(df1[(df1.tenure <= 0) | (df1.tenure > 80)])
you will get 11 with your data as the number of null
values in your df2
(here it is 3 with my data)
Upvotes: 1