Reputation: 79
I have one column in a csv which are the names of fruits which I want to convert into an array.
Sample csv column:
Names:
Apple
Banana
Pear
Watermelom
Jackfruit
..
..
..
There are around 400 fruit names in the column
I have used one hot encoding for the same but unable to display the column names(each fruit name from a row of the csv column)
My code till now is:
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
dataset = pd.read_csv('D:/fruits.csv')
X= dataset.iloc[:, 0].values
labelencoder_X = LabelEncoder()
D= labelencoder_X.fit_transform(X)
D = D.reshape(-1, 1)
onehotencoder = OneHotEncoder(sparse=False, categorical_features = [0])
X = onehotencoder.fit_transform(D)
This converts the data of the column into a numpy array but the columns names are coming as [0 1 2 3 .. ..] which I want as each row name of the csv, example [Apple Banana Pear Watermelon .. .. ]
How can I retain the column names after using one hot encoding
Upvotes: 1
Views: 745
Reputation: 1318
Orignal Answer:
A rather efficient way to OneHotEncode would be to use pd.get_dummies
.
I've applied on sample data:
data = {'Names':['Apple','Banana','Pear', 'Watermelon']}
df = pd.DataFrame(data=data)
df_new = pd.get_dummies(df)
print(df_new)
Orignal df:
Names
0 Apple
1 Banana
2 Pear
3 Watermelon
Encoded df:
Names_Apple Names_Banana Names_Pear Names_Watermelon
0 1 0 0 0
1 0 1 0 0
2 0 0 1 0
3 0 0 0 1
Edit:
Let's assume that our dataframe contains 2 Categorical & 2 Numeric features. We just want to OneHotEncode 1 of the 2 Categorical columns.
Generating dummy Data:
data = {'Names':['Apple','Banana','Pear', 'Watermelom'],
'Category' :['A','B','A','B'],
'Val1':[10,20,30,30],
'Val2':[60,70,80,90]}
df = pd.DataFrame(data=data)
Names Category Val1 Val2
0 Apple A 10 60
1 Banana B 20 70
2 Pear A 30 80
3 Watermelom B 30 90
If we just want to OneHotEncode Names
we would do that by
df_new = pd.get_dummies(df, columns=['Names'])
print(df_new)
You can refer to this documentation. By defining columns
we would only encode columns of interest.
Encoded Output:
Category Val1 Val2 Names_Apple Names_Banana Names_Pear Names_Watermelom
0 A 10 60 1 0 0 0
1 B 20 70 0 1 0 0
2 A 30 80 0 0 1 0
3 B 30 90 0 0 0 1
Upvotes: 2