The Nightman
The Nightman

Reputation: 5759

Making comparisons between Pandas columns

I know I have read a way to do what I am looking for but I can't seem to find it.

I have a pandas DataFrame that looks something like this:

       Chrom        Loc WT Var Change ConvChange  AO     DP          VAF  \
0       chr1  115227855  T   A    T>A        T>A   5  19346  0.000258451   

      IntEx   Gene Upstream Downstream Individual  
0      TIII  TIIIa        T          C          1

I would like to understand across Individual where the greatest differences in VAF exist at each unique Chrom, Loc and Change combination.

I was thinking of changing the current DataFrame so it looks something like below to give me standard deviation column that could be sorted to give me the locations with the greatest differences. Is this a good approach and is there a good way to do something like this?

                                1              2              3           Mean             Std
chr1-115227855-T>A    0.000258451    0.000548128    0.000789456    0.000532011    0.0002170812

Upvotes: 0

Views: 52

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

You can use some Pandas reshaping:

MCVE:

Given:

print(df)

  Chrom  Individual       VAF Var WT
0  chr1           1  0.076397   A  T
1  chr1           2  0.964344   A  T
2  chr1           3  0.563713   A  T

Reshape and aggregate:

df.set_index(['Chrom','WT','Var','Individual'])['VAF'].unstack(-1)\
  .pipe(lambda x: x.assign(mean=x.mean(1), std=x.std(1)))

Output:

Individual           1         2         3      mean       std
Chrom WT Var                                                  
chr1  T  A    0.076397  0.964344  0.563713  0.534818  0.444678

Upvotes: 1

Related Questions