sbradbio
sbradbio

Reputation: 169

multi row as column header using pandas

I have a df that looks like this

df
    sample  total   count   base    positive    negative    group
    A   698 696 A   347 349 group1
    A   698 0   C   0   0   group1
    A   698 1   G   0   1   group1
    A   698 0   T   0   0   group1
    A   698 0   N   0   0   group1
    A   698 22  +G  14  8   group1
    A   698 1   +GG 0   1   group1
    A   698 1   -A  1   0   group1
    A   707 1   A   1   0   group1
    B   707 1   C   1   0   group1
    B   707 693 G   340 353 group1
    B   707 0   T   0   0   group1
    B   707 0   N   0   0   group1
    B   707 1   +A  0   1   group1
    B   707 11  -G  6   5   group1
    B   707 1   -GG 0   1   group1
    A   814 804 A   417 387 group2
    A   814 2   C   1   1   group2
    A   814 7   G   2   5   group2
    A   814 1   T   1   0   group2
    A   814 0   N   0   0   group2
    A   814 225 +G  114 111 group2
    A   814 7   +GG 4   3   group2
    B   826 0   A   0   0   group2
    B   826 0   C   0   0   group2
    B   826 815 G   416 399 group2
    B   826 0   T   0   0   group2
    B   826 0   N   0   0   group2
    B   826 11  -G  6   5   group2

I would like to convert/pivot it to df with headers given below

expected output

group   sample  total   +A  +G  +GG -A  -G  -GG A   C   G   N   T
group1  A   698 0   22  1   1   0   0   696 0   0   0   0
group1  B   707 1   0   0   0   11  1   1   1   693 0   0
group2  A   814 0   225 7   0   0   0   804 2   7   0   1
group2  B   826 0   0   0   0   11  0   0   0   815 0   0

So far adopting code shown here pandas row values to column headers

Using code:

_df = df.groupby(['group', 'base']).size().unstack(fill_value=0).reset_index().rename_axis(None, axis=1)

Output

 _df
    group   +A  +G  +GG -A  -G  -GG A   C   G   N   T
0   group1  1   1   1   1   1   1   2   2   2   2   2
1   group2  0   1   1   0   1   0   2   2   2   2   2

I am not sure what inbuilt command does pandas have where i can actually get the number from original dataframe and not 0,1 also I miss the total column.

Upvotes: 0

Views: 520

Answers (2)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

Use pivot table like this

_df = df.pivot_table(index=['group','sample','total'],columns=['base'],values='count').fillna(0)

Output:

base                  +A     +G  +GG   -A    -G  -GG      A    C      G    N  \
group  sample total                                                            
group1 A      698    0.0   22.0  1.0  1.0   0.0  0.0  696.0  0.0    1.0  0.0   
              707    0.0    0.0  0.0  0.0   0.0  0.0    1.0  0.0    0.0  0.0   
       B      707    1.0    0.0  0.0  0.0  11.0  1.0    0.0  1.0  693.0  0.0   
group2 A      814    0.0  225.0  7.0  0.0   0.0  0.0  804.0  2.0    7.0  0.0   
       B      826    0.0    0.0  0.0  0.0  11.0  0.0    0.0  0.0  815.0  0.0   

base                   T  
group  sample total       
group1 A      698    0.0  
              707    0.0  
       B      707    0.0  
group2 A      814    1.0  
       B      826    0.0  

Upvotes: 3

ashish trehan
ashish trehan

Reputation: 433

pd.pivot_table(_df, columns=['base'], values=['size'],index='index_name').fillna(0)

Upvotes: 1

Related Questions