Reputation: 1309
Given a list of list, in which each sublist is a bucket filled with letters, like:
L=[['a','c'],['b','e'],['d']]
I would like to encode each sublist as one row in my DataFrame
like this:
a b c d e
0 1 0 1 0 0
1 0 1 0 0 1
2 0 0 0 1 0
Let's assume the letter is just from 'a' to 'e'. I am wondering how to complete a function to do so.
Upvotes: 3
Views: 45
Reputation: 164613
You can use the sklearn
library:
import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
L = [['a', 'c'], ['b', 'e'], ['d']]
mlb = MultiLabelBinarizer()
res = pd.DataFrame(mlb.fit_transform(L),
columns=mlb.classes_)
print(res)
a b c d e
0 1 0 1 0 0
1 0 1 0 0 1
2 0 0 0 1 0
Upvotes: 4