Reputation: 169
I have a df that looks like this
df
sample total count base positive negative group
A 698 696 A 347 349 group1
A 698 0 C 0 0 group1
A 698 1 G 0 1 group1
A 698 0 T 0 0 group1
A 698 0 N 0 0 group1
A 698 22 +G 14 8 group1
A 698 1 +GG 0 1 group1
A 698 1 -A 1 0 group1
A 707 1 A 1 0 group1
B 707 1 C 1 0 group1
B 707 693 G 340 353 group1
B 707 0 T 0 0 group1
B 707 0 N 0 0 group1
B 707 1 +A 0 1 group1
B 707 11 -G 6 5 group1
B 707 1 -GG 0 1 group1
A 814 804 A 417 387 group2
A 814 2 C 1 1 group2
A 814 7 G 2 5 group2
A 814 1 T 1 0 group2
A 814 0 N 0 0 group2
A 814 225 +G 114 111 group2
A 814 7 +GG 4 3 group2
B 826 0 A 0 0 group2
B 826 0 C 0 0 group2
B 826 815 G 416 399 group2
B 826 0 T 0 0 group2
B 826 0 N 0 0 group2
B 826 11 -G 6 5 group2
I would like to convert/pivot it to df with headers given below
group sample total +A +G +GG -A -G -GG A C G N T
group1 A 698 0 22 1 1 0 0 696 0 0 0 0
group1 B 707 1 0 0 0 11 1 1 1 693 0 0
group2 A 814 0 225 7 0 0 0 804 2 7 0 1
group2 B 826 0 0 0 0 11 0 0 0 815 0 0
So far adopting code shown here pandas row values to column headers
Using code:
_df = df.groupby(['group', 'base']).size().unstack(fill_value=0).reset_index().rename_axis(None, axis=1)
Output
_df
group +A +G +GG -A -G -GG A C G N T
0 group1 1 1 1 1 1 1 2 2 2 2 2
1 group2 0 1 1 0 1 0 2 2 2 2 2
I am not sure what inbuilt command does pandas have where i can actually get the number from original dataframe and not 0,1 also I miss the total column.
Upvotes: 0
Views: 520
Reputation: 30605
Use pivot table like this
_df = df.pivot_table(index=['group','sample','total'],columns=['base'],values='count').fillna(0)
Output:
base +A +G +GG -A -G -GG A C G N \ group sample total group1 A 698 0.0 22.0 1.0 1.0 0.0 0.0 696.0 0.0 1.0 0.0 707 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 B 707 1.0 0.0 0.0 0.0 11.0 1.0 0.0 1.0 693.0 0.0 group2 A 814 0.0 225.0 7.0 0.0 0.0 0.0 804.0 2.0 7.0 0.0 B 826 0.0 0.0 0.0 0.0 11.0 0.0 0.0 0.0 815.0 0.0 base T group sample total group1 A 698 0.0 707 0.0 B 707 0.0 group2 A 814 1.0 B 826 0.0
Upvotes: 3
Reputation: 433
pd.pivot_table(_df, columns=['base'], values=['size'],index='index_name').fillna(0)
Upvotes: 1