Garvey
Garvey

Reputation: 1309

Encode multiple label in DataFrame

Given a list of list, in which each sublist is a bucket filled with letters, like:

L=[['a','c'],['b','e'],['d']]

I would like to encode each sublist as one row in my DataFrame like this:

    a   b   c   d   e
0   1   0   1   0   0
1   0   1   0   0   1
2   0   0   0   1   0

Let's assume the letter is just from 'a' to 'e'. I am wondering how to complete a function to do so.

Upvotes: 3

Views: 45

Answers (1)

jpp
jpp

Reputation: 164613

You can use the sklearn library:

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer

L = [['a', 'c'], ['b', 'e'], ['d']]

mlb = MultiLabelBinarizer()

res = pd.DataFrame(mlb.fit_transform(L),
                   columns=mlb.classes_)

print(res)

   a  b  c  d  e
0  1  0  1  0  0
1  0  1  0  0  1
2  0  0  0  1  0

Upvotes: 4

Related Questions