cph_sto
cph_sto

Reputation: 7585

Transpose of a DataFrame

I have the following DataFrame -

df=pd.DataFrame([[11132,'Jim','Jan',2012,2,'A'],[11132,'Jim','Jan',2012,5,'U'],
[11132,'Jim','Feb',2012,6,'Z'],[11132,'Jim','Feb',2012,4,'A'],
[2755,'Jack','Apr',2012,1,'U'],[2755,'Jack','Apr',2012,3,'A'],[2755,'Jack','Apr',2012,1,'Z'],
[9753,'Sue','Jan',2012,8,'U'],[9753,'Sue','Feb',2012,1,'Z'],[9753,'Sue','Jan',2012,5,'A'],],columns=['ID_Patient', 'Name_Patient','month','year','var','visits'])

df
Out[744]: 
   ID_Patient Name_Patient month  year  var visits
0       11132          Jim   Jan  2012    2      A
1       11132          Jim   Jan  2012    5      U
2       11132          Jim   Feb  2012    6      Z
3       11132          Jim   Feb  2012    4      A
4        2755         Jack   Apr  2012    1      U
5        2755         Jack   Apr  2012    3      A
6        2755         Jack   Apr  2012    1      Z
7        9753          Sue   Jan  2012    8      U
8        9753          Sue   Feb  2012    1      Z
9        9753          Sue   Jan  2012    5      A

How to convert it into:

df_altered
Out[748]: 
   ID_Patient Name_Patient month  year    U    A    Z
0       11132          Jim   Jan  2012  5.0  2.0  NaN
1       11132          Jim   Feb  2012  NaN  4.0  6.0
2        2755         Jack   Apr  2012  1.0  3.0  1.0
3        9753          Sue   Jan  2012  8.0  5.0  NaN
4        9753          Sue   Feb  2012  NaN  NaN  1.0

I was trying to solve it using the following Approach, but to no avail.

.set_index(....).unstack()

Any suggestions on how this can be accomplished using the above aforementioned functions? Thanking you in anticipation

Upvotes: 1

Views: 140

Answers (2)

Zero
Zero

Reputation: 76917

You could use pivot_table like

In [2781]: df.pivot_table(index=['ID_Patient', 'Name_Patient','month','year'], 
                          columns='visits', values='var').reset_index()
Out[2781]:
visits  ID_Patient Name_Patient month  year    A    U    Z
0             2755         Jack   Apr  2012  3.0  1.0  1.0
1             9753          Sue   Feb  2012  NaN  NaN  1.0
2             9753          Sue   Jan  2012  5.0  8.0  NaN
3            11132          Jim   Feb  2012  4.0  NaN  6.0
4            11132          Jim   Jan  2012  2.0  5.0  NaN

Upvotes: 2

jezrael
jezrael

Reputation: 862641

Exactly, you are right - need set_index and unstack:

df1 = (df.set_index(['ID_Patient', 'Name_Patient','month','year','visits'])['var']
         .unstack()
         .reset_index()
         .rename_axis(None, 1))
print (df1)
   ID_Patient Name_Patient month  year    A    U    Z
0        2755         Jack   Apr  2012  3.0  1.0  1.0
1        9753          Sue   Feb  2012  NaN  NaN  1.0
2        9753          Sue   Jan  2012  5.0  8.0  NaN
3       11132          Jim   Feb  2012  4.0  NaN  6.0
4       11132          Jim   Jan  2012  2.0  5.0  NaN

Upvotes: 2

Related Questions