Reputation: 2548
I have csv file in the following format
feature_1 | feature_2 | ... | feature_n | label
where the label is of type string. I have successfully read the file with pandas
with:
data = pandas.read_csv("dataset/iris.csv", delimiter=",")
proced_data = data.values
However, as shown in the tensorflow MNIST example, the labels are formatted as
label_0 | label_1 | ... | label_9
where for one sample, only one of the labels is 1
, and all the others are 0
. As the label of my proced_data
is one column of string, I am wondering what is the fastest way to convert it to the mnist-like format?
Thanks.
Upvotes: 1
Views: 482
Reputation: 2392
Pandas has an one-hot encoder, so you could just use pd.get_dummies(..)
to convert the labels to dummy variables.
In your case,
import pandas as pd
data = pd.read_csv("dataset/iris.csv", delimiter=",")
y = pd.get_dummies(data['label'])
Btw, one more note. You should do import pandas as pd
and then do pd.read_csv(..)
. This is a common approach for importing pandas as a package.
Upvotes: 1