SirC
SirC

Reputation: 2209

Manipulate pandas dataframe

I have the following pandas dataframe:

        colA   colB   
idcode
3223      s1   13
3223      s2   18
211       s1   24
211       s2   43
211       s3  198

where colA and colB are the columns and idcode is the index for rows. I would like to have as output a df as follows:

        s1   s2  s3
idcode  
3223    13   18  nan(or 0 but I'd prefer nan)
211     24   43  198 

How can I do this?

Upvotes: 2

Views: 49

Answers (2)

piRSquared
piRSquared

Reputation: 294488

Use set_index and unstack

df.set_index('colA', append=True).colB.unstack()

colA      s1    s2     s3
idcode                   
211     24.0  43.0  198.0
3223    13.0  18.0    NaN

To get rid of the colA label on the columns object.

df.set_index('colA', append=True).colB.unstack().rename_axis(None, 1)

          s1    s2     s3
idcode                   
211     24.0  43.0  198.0
3223    13.0  18.0    NaN

What if idcode/colA pairs are not unique?
Then we have to aggregate.

pandas 0.20
groupby

#                          This could be any thing
#                          that aggregates, mean, sum, etc.
#                                   /---\
df.groupby(['idcode', 'colA']).colB.first().unstack()

colA      s1    s2     s3
idcode                   
211     24.0  43.0  198.0
3223    13.0  18.0    NaN

pivot_table

df.pivot_table('colB', index='idcode', columns='colA', aggfunc='first')

colA      s1    s2     s3
idcode                   
211     24.0  43.0  198.0
3223    13.0  18.0    NaN

Upvotes: 2

Andrew L
Andrew L

Reputation: 7038

Here's another way using pivot():

df.pivot(columns='colA', values='colB')
colA      s1    s2     s3
idcode                   
211     24.0  43.0  198.0
3223    13.0  18.0    NaN

Upvotes: 2

Related Questions