ananthbv
ananthbv

Reputation: 195

pandas create column with names from values and substitute with True/False

I have a dataframe like this:

df = pd.DataFrame({"id":[1, 1, 1, 2, 2, 2, 2, 3, 3], "val":["A12", "B23", "C34", "A12", "C34", "E45", "F56", "G67", "B23"]})
print(df)
   id  val
0   1  A12
1   1  B23
2   1  C34
3   2  A12
4   2  C34
5   2  E45
6   2  F56
7   3  G67
8   3  B23

How do I convert it to look like this?

   id  A12  B23  C34  E45  F56  G67
0   1    1    1    1    0    0    0
1   2    1    0    1    1    1    0
2   3    0    1    0    0    0    1

I tried pivot and unstack but since the number of values in the 'val' column can be different for each 'id', I'm not able to create a master list of columns and then somehow fill the values in those columns. Please help.

Upvotes: 2

Views: 67

Answers (3)

Andrej Kesely
Andrej Kesely

Reputation: 195438

You can use pandas.get_dummies() and then groupby():

print( pd.concat([df, pd.get_dummies(df['val'])], axis=1).groupby('id').sum().reset_index() )

Prints:

   id  A12  B23  C34  E45  F56  G67
0   1    1    1    1    0    0    0
1   2    1    0    1    1    1    0
2   3    0    1    0    0    0    1

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150745

Another way with get_dummies and groupby:

(pd.get_dummies(df.val)
   .groupby(df['id'])
   .sum()
   .reset_index()
)

Or set_index and get_dummies:

(pd.get_dummies(df.set_index('id').val)
   .sum(level='id')
   .reset_index()
)

Output:

   id  A12  B23  C34  E45  F56  G67
0   1    1    1    1    0    0    0
1   2    1    0    1    1    1    0
2   3    0    1    0    0    0    1

Upvotes: 2

SimonR
SimonR

Reputation: 1824

Try crosstab:

pd.crosstab(df.id, df.val).reset_index()

Upvotes: 4

Related Questions