NamAshena
NamAshena

Reputation: 1737

applying pivot table on pandas dataframe instead of grouping

I have a dataframe like this and can group it by library and sample columns and create new columns:

df = pd.DataFrame({'barcode': ['b1', 'b2','b1','b2','b1', 
'b2','b1','b2'],
'library': ['l1', 'l1','l1','l1','l2', 'l2','l2','l2'],
'sample': ['s1','s1','s2','s2','s1','s1','s2','s2'],
'category': ['c1', 'c2','c1','c2','c1', 'c2','c1','c2'],
'count': [10,21,13,54,51,16,67,88]})

df

   barcode library sample category  count
0      b1      l1     s1       c1     10
1      b2      l1     s1       c2     21
2      b1      l1     s2       c1     13
3      b2      l1     s2       c2     54
4      b1      l2     s1       c1     51
5      b2      l2     s1       c2     16
6      b1      l2     s2       c1     67
7      b2      l2     s2       c2     88

I used grouping to reduce dimentions of the df:

grp=df.groupby(['library','sample'])

df=grp.get_group(('l1','s1')).rename(columns={"count": 
"l1_s1_count"}).reset_index(drop=True)
df['l1_s2_count']=grp.get_group(('l1','s2'))[['count']].values
df['l2_s1_count']=grp.get_group(('l2','s1'))[['count']].values
df['l2_s2_count']=grp.get_group(('l2','s2'))[['count']].values

df=df.drop(['sample','library'],axis=1)

result

barcode category    l1_s1_count l1_s2_count l2_s1_count  
l2_s2_count
0   b1  c1  10  13  51  67
1   b2  c2  21  54  16  88

I think there should be a neater way for this transformation, like using pivot table which I failed with, could you please suggest how this could be done with pivot table? thanks.

Upvotes: 2

Views: 111

Answers (1)

NoobVB
NoobVB

Reputation: 1033

try pivot_table function as below, it will produce multi-index result, which will need to be flattened.

df2 = pd.pivot_table(df,index=['barcode', 'category'], columns= ['sample', 'library'], values='count').reset_index()
df2.columns = ["_".join(a) for a in df2.columns.to_flat_index()]

out:

  barcode_ category_  s1_l1  s1_l2  s2_l1  s2_l2
0       b1        c1     10     51     13     67
1       b2        c2     21     16     54     88

or even without , values='count'.

df2 = pd.pivot_table(df,index=['barcode', 'category'], columns= ['sample', 'library']).reset_index()
df2.columns = ["_".join(a) for a in df2.columns.to_flat_index()]

out:

 barcode__ category__  count_s1_l1  count_s1_l2  count_s2_l1  count_s2_l2
0        b1         c1           10           51           13           67
1        b2         c2           21           16           54           88

as per your preference

Upvotes: 2

Related Questions