lima0
lima0

Reputation: 121

Map classes to Pandas one hot encoding

Given the below sequence:

[I, Z, S, I, I, J, N, J, I]

and given the below Pandas data frame:

char  fricative  nasal  lateral  labial  coronal  dorsal  frontal
I             0      0        0       0        0       0        1
J             0      0        1       0        1       0        1
N             0      1        0       0        0       1        0
S             1      0        0       0        1       0        0
Z             1      0        0       0        1       0        0

How can I map each character from the sequence to it's respective one hot vector from the data frame?

Upvotes: 2

Views: 159

Answers (1)

Dani Mesejo
Dani Mesejo

Reputation: 61930

Use:

df = df.set_index("char")
res = df.loc[sequence, :].to_numpy().tolist()

Output

[[0, 0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 1, 0, 0], [1, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 1], [0, 0, 1, 0, 1, 0, 1], [0, 1, 0, 0, 0, 1, 0], [0, 0, 1, 0, 1, 0, 1], [0, 0, 0, 0, 0, 0, 1]]

UPDATE

If you want also the active categories, you could index directly into the columns with a boolean mask, as below:

df = df.set_index("char")
res = [df.columns[row.astype(bool)].tolist() for row in df.loc[sequence, :].to_numpy()]
print(res)

Output

[['frontal'], ['fricative', 'coronal'], ['fricative', 'coronal'], ['frontal'], ['frontal'], ['lateral', 'coronal', 'frontal'], ['nasal', 'dorsal'], ['lateral', 'coronal', 'frontal'], ['frontal']]

Upvotes: 2

Related Questions