Reputation: 43
I have an array of strings, and I'd like to take those strings and treat them each as boolean arrays corresponding to the alphabet (A-Z).
My goal is to do this in vectorized way and avoid any looping.
E.g.
Input:
A = np.array(['A'])
B = np.array(['AB'])
C = np.array(['AZ'])
D = np.array(['AZ','BAZ'])
Output:
A = np.array([1,0,0,0,...0])
B = np.array([1,1,0,0,...0])
C = np.array([1,0,0,0,...1])
D = np.array([[1,0,0,0,...1], [1,1,0,0,...1]])
Upvotes: 2
Views: 118
Reputation: 71707
map
with MultiLabelBinarizer.transform
We can fit
the MultiLabelBinarizer
on the capital letters from A-Z
, then transform the arrays A, B, C, and D using the transform method of MultiLabelBinarizer
import string
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer().fit([*string.ascii_uppercase])
A, B, C, D = map(mlb.transform, (A, B, C, D))
>>> A
array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]])
>>> B
array([[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]])
Upvotes: 1