Reputation: 71
I have pandas Data Frame with following structure
item_condition_id category
brand_name category
price float64
shipping category
main_category category
category category
sub_category category
hashing_feature_aa float64
hashing_feature_ab float64
Example with portion of data:
brand_name shipping main_category category
Target 1 Women Tops & Blouses
unknown 1 Home Home Décor
unknown 0 Women Jewelry
unknown 0 Women Other
I have converted categorical (Strings) columns to numerical using below code.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
for i in range(len(X)):
X.iloc[:,i] = le.fit_transform(X.iloc[:,i])
After Conversion
brand_name shipping main_category category
0 1 1 3
1 1 0 0
1 0 1 1
1 0 1 2
This is working as expected but while trying apply inverse_transform to get the original categories from numerical categories it is throwing error.
for i in range(len(X)):
X.iloc[:,i] = le.inverse_transform(X.iloc[:,i])
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
How to resolve this error in my case , what's wrong with my code ?
My goal is convert categorical (strings) features to numerical using Label Encoder in order to apply sklearn.feature_selection.SelectKbest.fit_transform(X,y), without label encoding this step is failing.
Thanks
Upvotes: 0
Views: 4238
Reputation: 6859
Based on your clarification: Your problem is overwriting the instance of le in your loop, so that it is only trained on the last column. Based on your code I would suggest putting them in a dict, e.g. as follows:
from sklearn.preprocessing import LabelEncoder
le = {}
for i in range(len(X)):
le[i] = LabelEncoder()
X.iloc[:,i] = le[i].fit_transform(X.iloc[:,i])
# do stuff
for i in range(len(X)):
X.iloc[:,i] = le[i].inverse_transform(X.iloc[:,i])
But as commented above, also look at this.
Upvotes: 1