3c.
3c.

Reputation: 275

how to split one column into many columns and count the frequency

Here is the question I have in mind, given a table

   Id   type
0   1    [a,b]
1   2     [c]
2   3     [a,d]

I want to convert it into the form of:

   Id     a  b  c  d
0   1     1  1  0  0
1   2     0  0  1  0
2   3     1  0  0  1

I need a very efficient way to convert a large table. any comment is welcome.

====================================

I have received several good answers, and really appreciate your help.

Now a new question comes along, which is my laptop memory is insufficient to generating the whole dataframe by using pd.dummies.

is there anyway to generate a sparse vector row by row and stack then together?

Upvotes: 0

Views: 821

Answers (2)

JAB
JAB

Reputation: 12801

try this:

pd.get_dummies(df.type.apply(lambda x: pd.Series([i for i in x])))

to explain:

df.type.apply(lambda x: pd.Series([i for i in x]

gets you a column for index position in your lists. You can then use get dummies to get the count of each value

pd.get_dummies(df.type.apply(lambda x: pd.Series([i for i in x])))

outputs:

    a   c   b   d
0   1   0   1   0
1   0   1   0   0
2   1   0   0   1

Upvotes: 1

shanmuga
shanmuga

Reputation: 4499

Try this

>>> df
   Id    type
0   1  [a, b]
1   2     [c]
2   3  [a, d]
>>> df2 = pd.DataFrame([x for x in df['type'].apply(
...           lambda item: dict(map(
...                                 lambda x: (x,1), 
...                             item)) 
...           ).values]).fillna(0)
>>> df2.join(df)
   a  b  c  d  Id    type
0  1  1  0  0   1  [a, b]
1  0  0  1  0   2     [c]
2  1  0  0  1   3  [a, d]

It basically convert the list of list to list of dict and construct a DataFrame out of this

[ ['a', 'b'], ['c'], ['a', 'd'] ] # list of list
[ {'a':1, 'b':1}, {'c':1}, {'a':1, 'd':1} ] # list of dict Make DataFrame out of this

Upvotes: 1

Related Questions