Reputation: 13
I've got one column with Primary ID numbers, and each of these Primary ID numbers can have up to 3 Secondary ID numbers associated with it. I want to pivot the secondary IDs so they all appear in up to 3 columns to the right of just one instance of each Primary ID.
Currently it looks like this:
Primary ID | Secondary ID |
---|---|
1 | 234234 |
1 | 435234 |
1 | 22233 |
2 | 334342 |
2 | 543236 |
2 | 134623 |
3 | 8475623 |
3 | 3928484 |
4 | 3723429 |
5 | 3945857 |
5 | 11112233 |
5 | 9878976 |
I want it to look like this:
Primary ID | Secondary 1 | Secondary 2 | Secondary 3 |
---|---|---|---|
1 | 234234 | 435234 | 22233 |
2 | 334342 | 543236 | 134623 |
3 | 8475623 | 3928484 | - |
4 | 3723429 | - | - |
5 | 3945857 | 11112233 | 9878976 |
Not sure how to get the column headers there and probably where my issues are coming from when I try to use pivot or pivot table with pandas.
Upvotes: 1
Views: 94
Reputation: 14949
Here's one way:
df = (
df.pivot_table(
index='Primary ID',
columns=df.groupby('Primary ID').cumcount().add(1),
values='Secondary ID'
).add_prefix('Secondary').reset_index()
)
Alternative:
df = df.assign(t=df.groupby('Primary ID').cumcount().add(
1)).set_index(['Primary ID', 't']).unstack(-1)
Primary ID Secondary1 Secondary2 Secondary3
0 1 234234.0 435234.0 22233.0
1 2 334342.0 543236.0 134623.0
2 3 8475623.0 3928484.0 NaN
3 4 3723429.0 NaN NaN
4 5 3945857.0 11112233.0 9878976.0
Upvotes: 1