Reputation: 1949
I'm trying to pivot (or unstack?) a data frame based on an ID field from something like this:
ID Col1 Col2
1 a aa
2 b bb
3 c cc
3 d dd
To this:
ID Col1 Col2 Col1 Col2
1 a aa
2 b bb
3 c cc d dd
So i'm trying:
df2 = df.pivot(None, 'ID')
Which gives me error:
builtins.IndexError: index 1796516957 is out of bounds for axis 0 with size 1796298784
What am i doing wrong?
Upvotes: 1
Views: 51
Reputation: 862511
Use cumcount
+ set_index
+ unstack
for MultiIndex
columns:
a = df.groupby('ID').cumcount().astype(str)
df = df.set_index(['ID', a]).unstack(fill_value='').sort_index(level=1, axis=1)
Alternative with apply
:
c = ['Col1','Col2']
df = (df.groupby('ID')[c]
.apply(lambda x: pd.DataFrame(x.values, columns=c))
.unstack(fill_value='')
.sort_index(level=1, axis=1))
print (df)
Col1 Col2 Col1 Col2
0 0 1 1
ID
1 a aa
2 b bb
3 c cc d dd
For duplicates column names use droplevel
:
df.columns = df.columns.droplevel(1)
print (df)
Col1 Col2 Col1 Col2
ID
1 a aa
2 b bb
3 c cc d dd
For unique columns names use join
:
df.columns = df.columns.map('_'.join)
print (df)
Col1_0 Col2_0 Col1_1 Col2_1
ID
1 a aa
2 b bb
3 c cc d dd
Upvotes: 1