Reputation: 95
I have a data frame with true/false values stored in string format. Some values are null in the data frame.
I need to encode this data such that TRUE/FALSE/null values are encoded with the same integer in every column.
Input:
col1 col2 col3
True True False
True True True
null null True
I am using:
le = preprocessing.LabelEncoder()
df.apply(le.fit_transform)
Output:
2 1 0
2 1 1
1 0 1
But I want the output as:
2 2 0
2 2 2
1 1 2
How do i do this?
Upvotes: 2
Views: 290
Reputation: 862406
For me working create one column DataFrame
:
df = df.stack(dropna=False).to_frame().apply(le.fit_transform)[0].unstack()
print (df)
col1 col2 col3
0 1 1 0
1 1 1 1
2 2 2 1
Another idea is use DataFrame.replace
with 'True'
instead True
, because:
I have a data frame with true/false values stored in string format.
If null
are missing values:
df = df.replace({'True':2, 'False':1, np.nan:0})
If null
are strings null
:
df = df.replace({'True':2, 'False':1, 'null':0})
print (df)
col1 col2 col3
0 2 2 1
1 2 2 2
2 0 0 2
Upvotes: 5