Reputation: 1737
I have a dataframe like this and can group it by library and sample columns and create new columns:
df = pd.DataFrame({'barcode': ['b1', 'b2','b1','b2','b1',
'b2','b1','b2'],
'library': ['l1', 'l1','l1','l1','l2', 'l2','l2','l2'],
'sample': ['s1','s1','s2','s2','s1','s1','s2','s2'],
'category': ['c1', 'c2','c1','c2','c1', 'c2','c1','c2'],
'count': [10,21,13,54,51,16,67,88]})
df
barcode library sample category count
0 b1 l1 s1 c1 10
1 b2 l1 s1 c2 21
2 b1 l1 s2 c1 13
3 b2 l1 s2 c2 54
4 b1 l2 s1 c1 51
5 b2 l2 s1 c2 16
6 b1 l2 s2 c1 67
7 b2 l2 s2 c2 88
I used grouping to reduce dimentions of the df:
grp=df.groupby(['library','sample'])
df=grp.get_group(('l1','s1')).rename(columns={"count":
"l1_s1_count"}).reset_index(drop=True)
df['l1_s2_count']=grp.get_group(('l1','s2'))[['count']].values
df['l2_s1_count']=grp.get_group(('l2','s1'))[['count']].values
df['l2_s2_count']=grp.get_group(('l2','s2'))[['count']].values
df=df.drop(['sample','library'],axis=1)
result
barcode category l1_s1_count l1_s2_count l2_s1_count
l2_s2_count
0 b1 c1 10 13 51 67
1 b2 c2 21 54 16 88
I think there should be a neater way for this transformation, like using pivot table which I failed with, could you please suggest how this could be done with pivot table? thanks.
Upvotes: 2
Views: 111
Reputation: 1033
try pivot_table function as below, it will produce multi-index result, which will need to be flattened.
df2 = pd.pivot_table(df,index=['barcode', 'category'], columns= ['sample', 'library'], values='count').reset_index()
df2.columns = ["_".join(a) for a in df2.columns.to_flat_index()]
out:
barcode_ category_ s1_l1 s1_l2 s2_l1 s2_l2
0 b1 c1 10 51 13 67
1 b2 c2 21 16 54 88
or even without , values='count'
.
df2 = pd.pivot_table(df,index=['barcode', 'category'], columns= ['sample', 'library']).reset_index()
df2.columns = ["_".join(a) for a in df2.columns.to_flat_index()]
out:
barcode__ category__ count_s1_l1 count_s1_l2 count_s2_l1 count_s2_l2
0 b1 c1 10 51 13 67
1 b2 c2 21 16 54 88
as per your preference
Upvotes: 2