Reputation: 121
Given the below sequence:
[I, Z, S, I, I, J, N, J, I]
and given the below Pandas data frame:
char fricative nasal lateral labial coronal dorsal frontal
I 0 0 0 0 0 0 1
J 0 0 1 0 1 0 1
N 0 1 0 0 0 1 0
S 1 0 0 0 1 0 0
Z 1 0 0 0 1 0 0
How can I map each character from the sequence to it's respective one hot vector from the data frame?
Upvotes: 2
Views: 159
Reputation: 61930
Use:
df = df.set_index("char")
res = df.loc[sequence, :].to_numpy().tolist()
Output
[[0, 0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 1, 0, 0], [1, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 1], [0, 0, 1, 0, 1, 0, 1], [0, 1, 0, 0, 0, 1, 0], [0, 0, 1, 0, 1, 0, 1], [0, 0, 0, 0, 0, 0, 1]]
UPDATE
If you want also the active categories, you could index directly into the columns with a boolean mask, as below:
df = df.set_index("char")
res = [df.columns[row.astype(bool)].tolist() for row in df.loc[sequence, :].to_numpy()]
print(res)
Output
[['frontal'], ['fricative', 'coronal'], ['fricative', 'coronal'], ['frontal'], ['frontal'], ['lateral', 'coronal', 'frontal'], ['nasal', 'dorsal'], ['lateral', 'coronal', 'frontal'], ['frontal']]
Upvotes: 2