teteh May
teteh May

Reputation: 455

Python: keep index duplicate for unstacking

I have the below data frame:

df = pd.DataFrame([['A',1,2],['B',1,5],['A',2,5],['C',1,8],['A',1,5]], columns = ['person','status','result'])
  person  status  result
0      A       1       2
1      B       1       5
2      A       2       5
3      C       1       8
4      A       1       5

My aim: I need to make person and status as index and check the result of each index. However, if there is index duplicate, I want to keep it such that person A with status 1 has 2 results 2 and 5

status    1    2
person          
A       2.0  5.0
A       5.0
B       5.0  NaN
C       8.0  NaN

My attempt:

df1.set_index(['person','status'])['result'].unstack()

But this does not work as duplicate of index is not allowed.

Upvotes: 1

Views: 101

Answers (1)

sammywemmy
sammywemmy

Reputation: 28709

Bit of a hack, here goes:

df = pd.DataFrame([['A',1,2],['B',1,5],['A',2,5],['C',1,8],['A',1,5]], columns 
   = ['person','status','result'])

df

  person    status  result
0     A       1     2
4     A       1     5
2     A       2     5
1     B       1     5
3     C       1     8


M=(df
  .assign(result= lambda x: x.result.astype('str'))
  .groupby(['person','status'],as_index=False)
  .agg(lambda x: x.str.cat(sep=','))
  .pivot(index='person',columns='status')
  )

#https://stackoverflow.com/a/55757002/7175713
M.columns = M.columns.to_flat_index()

(M.reset_index()
 .assign(exp = lambda x: x[('result',1)].str.split(','))
 .explode('exp')
 .drop(('result',1),axis=1)
 .rename(columns={('result',2):'2','exp':'1'})
 .reset_index(drop=True)
 .loc[:,['person','1','2']]
)


    person  1   2
  0     A   2   5
  1     A   5   5
  2     B   5   NaN
  3     C   8   NaN

Upvotes: 1

Related Questions