Bera
Bera

Reputation: 1949

How to pivot data frame

I'm trying to pivot (or unstack?) a data frame based on an ID field from something like this:

ID  Col1    Col2
1   a   aa
2   b   bb
3   c   cc
3   d   dd

To this:

ID  Col1    Col2    Col1    Col2
1   a   aa      
2   b   bb      
3   c   cc           d      dd

So i'm trying:

df2 = df.pivot(None, 'ID')

Which gives me error:

builtins.IndexError: index 1796516957 is out of bounds for axis 0 with size 1796298784

What am i doing wrong?

Upvotes: 1

Views: 51

Answers (1)

jezrael
jezrael

Reputation: 862511

Use cumcount + set_index + unstack for MultiIndex columns:

a = df.groupby('ID').cumcount().astype(str)
df = df.set_index(['ID', a]).unstack(fill_value='').sort_index(level=1, axis=1)

Alternative with apply:

c = ['Col1','Col2']
df = (df.groupby('ID')[c]
       .apply(lambda x: pd.DataFrame(x.values, columns=c))
       .unstack(fill_value='')
       .sort_index(level=1, axis=1))
print (df)
   Col1 Col2 Col1 Col2
      0    0    1    1
ID                    
1     a   aa          
2     b   bb          
3     c   cc    d   dd

For duplicates column names use droplevel:

df.columns = df.columns.droplevel(1)
print (df)

   Col1 Col2 Col1 Col2
ID                    
1     a   aa          
2     b   bb          
3     c   cc    d   dd

For unique columns names use join:

df.columns = df.columns.map('_'.join)
print (df)

   Col1_0 Col2_0 Col1_1 Col2_1
ID                            
1       a     aa              
2       b     bb              
3       c     cc      d     dd

Upvotes: 1

Related Questions