bjurstrs
bjurstrs

Reputation: 1089

How to pick which column to unstack a dataframe on

I have a data set that looks like:

UniqueID CategoryType Value
   A         Cat1      apple
   A         Cat2      banana
   B         Cat1      orange
   C         Cat2      news
   D         Cat1      orange
   D         Cat2      blue

I'd like it to look like:

UniqueID Cat1 Cat2
   A     apple banana
   B     orange
   C           news
   D     orange blue

I've tried using unstack, but can't get the right index set or something.

Thanks

Upvotes: 4

Views: 2652

Answers (5)

BENY
BENY

Reputation: 323326

Take Me so long time to think outside the box :)

index = df.UniqueID.unique()
columns = df.CategoryType.unique()
df1= pd.DataFrame(index=index, columns=columns)
df['match']=df.UniqueID.astype(str)+df.CategoryType
A=dict( zip( df.match, df.Value))
df1.apply(lambda x : (x.index+x.name)).applymap(A.get).replace({None:''})

Out[406]: 
     Cat1    Cat2
A   apple  banana
B  orange        
C            news
D  orange    blue

Upvotes: 0

kjmerf
kjmerf

Reputation: 4345

pivot works just fine:

df = df.pivot(index = "UniqueID", columns = "CategoryType", values = "Value")

Upvotes: 1

piroot
piroot

Reputation: 772

You can use pivot_table with fill_value

df.pivot_table(index='UniqueID', columns='CategoryType', values='Value',
               aggfunc='sum', fill_value='')

CategoryType    Cat1    Cat2
UniqueID
A              apple  banana
B             orange
C                       news
D             orange    blue

Upvotes: 1

Vaishali
Vaishali

Reputation: 38415

You can use pivot

Edit: With some more edit and inspiration from @piRsquared's answer,

df.pivot('UniqueID', 'CategoryType', 'Value').replace({None: ''}).rename_axis(None, 1).reset_index()


    UniqueID    Cat1    Cat2
0   A           apple   banana
1   B           orange  
2   C                   news
3   D           orange  blue

Upvotes: 2

piRSquared
piRSquared

Reputation: 294488

The bulk of the work is done with

df.set_index(['UniqueID', 'CategoryType']).Value.unstack(fill_value='')

CategoryType    Cat1    Cat2
UniqueID                    
A              apple  banana
B             orange        
C                       news
D             orange    blue

We can get the rest of the formatting with

df.set_index(['UniqueID', 'CategoryType']).Value.unstack(fill_value='') \
    .rename_axis(None, 1).reset_index()

  UniqueID    Cat1    Cat2
0        A   apple  banana
1        B  orange        
2        C            news
3        D  orange    blue

Upvotes: 4

Related Questions