Reputation: 747
I have the next table:
| Name | TagID | Key |
|------|-------|-------|
| a | 285 | 1 |
| a | 101 | 0 |
| a | 200 | 1 |
| a | 1601 | 0 |
| a | 1991 | 0 |
| a | 7075 | 0 |
| b | 285 | 1 |
| b | 132 | 0 |
| b | 101 | 0 |
| b | 200 | 1 |
| b | 165 | 0 |
| b | 1617 | 0 |
| b | 160 | 0 |
| b | 334 | 1 |
| c | 285 | 1 |
| c | 1123 | 1 |
| c | 200 | 0 |
I want to get a table with the shape (unique Names
× unique TagID
) and with 1 if TagID
has Key
= 1.
Question: How to do this in the most efficient way?
Just, I have about 1 mln rows and 3k unique TagID
.
E.g.:
| | 101 | 132 | 160 | 165 | 200 | 285 | 334 | 1123 | 1601 | 1617 | 1991 | 7075 |
|---|-----|-----|-----|-----|-----|-----|-----|------|------|------|------|------|
| a | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| b | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| c | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
Upvotes: 1
Views: 73
Reputation: 862611
Use pivot
, replace missing values and cast to integers or set_index
with unstack
:
df1 = df.pivot('Name','TagID','Key').fillna(0).astype(int)
#alternative
#df1 = df.set_index(['Name','TagID'])['Key'].unstack(fill_value=0)
print (df1)
TagID 101 132 160 165 200 285 334 1123 1601 1617 1991 7075
Name
a 0 0 0 0 1 1 0 0 0 0 0 0
b 0 0 0 0 1 1 1 0 0 0 0 0
c 0 0 0 0 0 1 0 1 0 0 0 0
EDIT: If get:
ValueError: Index contains duplicate entries, cannot reshape
it means duplicates in Name
and TagID
, so is necessary aggregate max
:
df2 = df.groupby(['Name','TagID'])['Key'].max().unstack(fill_value=0)
#alternative
df2 = df.pivot_table(index='Name',
columns='TagID',
values='Key',
fill_value=0,
aggfunc='max')
print (df2)
TagID 101 132 160 165 200 285 334 1123 1601 1617 1991 7075
Name
a 0 0 0 0 1 1 0 0 0 0 0 0
b 0 0 0 0 1 1 1 0 0 0 0 0
c 0 0 0 0 0 1 0 1 0 0 0 0
Upvotes: 1