Reputation: 592
I have a theoretical question about the function LabelEncoder().fit_transform.
I am using the function / method in a classification application. It's working perfectly.
#Import
from sklearn.preprocessing import LabelEncoder
#Transform original values by encoded labels
df_data = df_data.apply(LabelEncoder().fit_transform)
However, in the documentation "sklearn.preprocessing.LabelEncoder" have: "This transformer should be used to encode target values, i.e. y, and not the input X".
I am applying this method across the dataframe. Numeric input variables (X) and categorical output variable (y). I thought of applying in X to transform the objective variable into the numeric type and I thought of applying in y to deal with problems of magnitude between different sources of input variables. Is this attitude correct? Is there another function that I can apply in place of LabelEncoder().fit_transform for input variables? Thank you
Upvotes: 0
Views: 677
Reputation: 880
As the documentation states Label Encoder should only be used to transform your lables. i.e from 'Apple', 'Orange'
to 0, 1
. If you have categorical labels then look at the One Hot Encoder. Additionally, if your input X
have differing scales, then take a look at the Standard Scaler
Upvotes: 1