Reputation: 275
Here is the question I have in mind, given a table
Id type
0 1 [a,b]
1 2 [c]
2 3 [a,d]
I want to convert it into the form of:
Id a b c d
0 1 1 1 0 0
1 2 0 0 1 0
2 3 1 0 0 1
I need a very efficient way to convert a large table. any comment is welcome.
====================================
I have received several good answers, and really appreciate your help.
Now a new question comes along, which is my laptop memory is insufficient to generating the whole dataframe by using pd.dummies
.
is there anyway to generate a sparse vector row by row and stack then together?
Upvotes: 0
Views: 821
Reputation: 12801
try this:
pd.get_dummies(df.type.apply(lambda x: pd.Series([i for i in x])))
to explain:
df.type.apply(lambda x: pd.Series([i for i in x]
gets you a column for index position in your lists. You can then use get dummies
to get the count of each value
pd.get_dummies(df.type.apply(lambda x: pd.Series([i for i in x])))
outputs:
a c b d
0 1 0 1 0
1 0 1 0 0
2 1 0 0 1
Upvotes: 1
Reputation: 4499
Try this
>>> df
Id type
0 1 [a, b]
1 2 [c]
2 3 [a, d]
>>> df2 = pd.DataFrame([x for x in df['type'].apply(
... lambda item: dict(map(
... lambda x: (x,1),
... item))
... ).values]).fillna(0)
>>> df2.join(df)
a b c d Id type
0 1 1 0 0 1 [a, b]
1 0 0 1 0 2 [c]
2 1 0 0 1 3 [a, d]
It basically convert the list of list to list of dict and construct a DataFrame out of this
[ ['a', 'b'], ['c'], ['a', 'd'] ] # list of list
[ {'a':1, 'b':1}, {'c':1}, {'a':1, 'd':1} ] # list of dict Make DataFrame out of this
Upvotes: 1