Nastja Kr
Nastja Kr

Reputation: 169

To bring string data to one hot vector for machine learning

everyone. I have a list with strings:

labels = ["Synonym", "Antonym", "Not relevant", "Synonym", "Antonym"]

There are 3 different labels and I want first to refer them to numbers 1,2 and 3 and then build one hot vector from them, like for example for label 3 --> 0 0 1 Have sombody an idea how to do it?

Upvotes: 1

Views: 71

Answers (1)

L3viathan
L3viathan

Reputation: 27273

A simple, library-less solution would be:

labels = ["Synonym", "Antonym", "Not relevant", "Synonym", "Antonym"]

mapping = {label: i for i, label in enumerate(set(labels))}

one_hot = []
for label in labels:
    entry = [0] * len(mapping)
    entry[mapping[label]] = 1
    one_hot.append(entry)

Result: [[0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 0, 1]].

But you might want to look into sklearn, specifically sklearn.preprocessing.OneHotEncoder.

Upvotes: 1

Related Questions