Tensorflow convert string column into multiple columns for classification

Question

I have csv file in the following format

feature_1 | feature_2 | ... | feature_n | label

where the label is of type string. I have successfully read the file with pandas with:

data = pandas.read_csv("dataset/iris.csv", delimiter=",")
proced_data = data.values

However, as shown in the tensorflow MNIST example, the labels are formatted as

label_0 | label_1 | ... | label_9

where for one sample, only one of the labels is 1, and all the others are 0. As the label of my proced_data is one column of string, I am wondering what is the fastest way to convert it to the mnist-like format?

Thanks.

Dat Tran · Accepted Answer

Pandas has an one-hot encoder, so you could just use pd.get_dummies(..) to convert the labels to dummy variables.

In your case,

import pandas as pd
data = pd.read_csv("dataset/iris.csv", delimiter=",")
y = pd.get_dummies(data['label'])

Btw, one more note. You should do import pandas as pd and then do pd.read_csv(..). This is a common approach for importing pandas as a package.

Tensorflow convert string column into multiple columns for classification

Answers (1)

Related Questions