Reputation: 11
I´m currently struggling with python labelEncoder.
I tried to apply it to a dataset that is composed of mode filled values and "self made up" values for larger amounts of missing values in some columns.
I tried to encode the data set and it works perfectly fine when i only use mode, median or mean function. But as soon as a column consists partly of values imputed by myself (e.g. 9999) to create a new category with those values the LabelEncoder does not seem to work..
Following error occurs:
TypeError: '<' not supported between instances of 'str' and 'float'
Since I know that the error occurs when weird signs or NaN´s are still in the data set i can assure you that i checked them all. That´s why I have no idea how to move on..
Any idea what´s the problem here?
To make the issue clearer, here is what i did: Y and N stand for Yes and No and used to be the original values of the column. I summed up all missing values under U to create a new category. Same for Unknown or 99999
Upvotes: 0
Views: 119
Reputation: 12165
As the error message says, you're trying to compare a float (a number) and a string (text) with numeric operator, which makes no sense (ex: which is greater 5
or fish
?). The problem is that when you add these values yourself, you think your adding a number 9999
, but what's actually happening is that python is inserting a string containing the text '9999'
.
The solution is to use the float()
function to ensure that manually inserted labels are put in as floating point numbers. You could also use them where you make numeric comparisons to ensure the data types are correct.
x = '5.5'
1 + x
TypeError: unsupported operand type(s) for +: 'int' and 'str'
1 + float(x)
6.5
Upvotes: 1