brno792
brno792

Reputation: 6799

Reformat Pandas Data Frame with 1 Column's Values as Columns and other Columns as rows

I have a df structured like:

name   date   A   B   C
n1     07/01  a   b   c
n1     06/01  aa  bb  cc
n1     05/01  aaa bbb ccc
...

I need to structure the dataframe so it looks like:

name   letters   05/01   06/01   07/01
n1     A        aaa     aa      a
n1     B        bbb     bb      b
n1     C        ccc     cc      c
...

I can pull all the columns into a new row using melt() from pandas:

import pandas as pd
pd.melt(df, id_vars=["name"], var_name = "letters")

Which yields:

name   letters   value
n1     date      05/01
n1     date      06/01
n1     date      07/01
n1     A         aaa 
n1     A         aa
n1     A         a
n1     B         bbb   
n1     B         bb
n1     B         b
n1     C         ccc
n1     C         cc
n1     C         c    
...

Now how can I pivot this out by 'date' to get my desired structure?

I wasn't sure how I can apply df.pivot() on the 'date' rows in my 'letters' column. Any suggestions or alternate steps?

Upvotes: 2

Views: 71

Answers (3)

Merlin
Merlin

Reputation: 25629

Try this:

 pd.concat([df[["name"]],df.iloc[:,1:].set_index("date").T.reset_index()],axis= 1 ).rename(columns = {'index':'letter'})

  name letter 07/01 06/01 05/01
0   n1      A     a    aa   aaa
1   n1      B     b    bb   bbb
2   n1      C     c    cc   ccc

Upvotes: 0

piRSquared
piRSquared

Reputation: 294218

I wanted to preserve my other answer, but I think this is a better answer.

tpose = lambda df: df.drop('name', 1).set_index('date').T
df.groupby('name', as_index=True).apply(tpose)

enter image description here

There are tweaks to be made to get to the final format you might want but I think it's uglier.


With tweaks

tpose = lambda df: df.drop('name', 1).set_index('date').T
df1 = df.groupby('name', as_index=True).apply(tpose)
df1.rename_axis(['name', 'letters']).reset_index().rename_axis(None, 1)

enter image description here

This is fully generalizable to whatever length dataset you have and any variable values in 'name'

Upvotes: 0

piRSquared
piRSquared

Reputation: 294218

Try this:

df1 = df.drop('name', 1).set_index('date').rename_axis('letters', 1).sort_index(1, ascending=1).T.reset_index()
df1.set_index(pd.Index(['n1'] * len(df1), name='name')).reset_index().rename_axis(None, 1)

enter image description here

Upvotes: 1

Related Questions