Hjorvik
Hjorvik

Reputation: 61

Python - Get the Indexes of a value on Pandas' Apply function

I have to recode some haplotypes that I have to code. I have them on a Pandas DataFrame of 305 rows and 129902 columns, and it looks like this (only one column and 20 rows):

rs#                                                  rs12914615  
SNPalleles                                                  C/T  
chrom                                                     chr15  
pos                                                    98259206  
strand                                                        +  
genome_build                                           ncbi_B36  
center                                               affymetrix  
protLSID      urn:LSID:affymetrix.hapmap.org:Protocol:Genome...  
assayLSID     urn:LSID:affymetrix.hapmap.org:Assay:SNP_A-837...  
panelLSID         urn:lsid:dcc.hapmap.org:Panel:CEPH-30-trios:1  
QC_code                                                     QC+  
NA06985                                                      CT  
NA06991                                                      CT  
NA06993                                                      CT  
NA06993.dup                                                  CC  
NA06994                                                      CC  
NA07000                                                      CC  
NA07019                                                      CT  
NA07022                                                      CT  

The idea is to compare if the values for each individual (NA06...) have both nucleotides in common with the wildtype (the first letter of the SNPalleles row) or if not, code it accordingly.

My probles is that I don't know how to iterate over the data frame while making reference to it's wildtype that is on other row in the same column.

The output should look something like this:

NA06985                                                      1  
NA06991                                                      1  
NA06993                                                      1  
NA06993.dup                                                  0  
NA06994                                                      0  
NA07000                                                      0  
NA07019                                                      1  
NA07022                                                      1

Being 0 the Wildtype (CC for this gene), 1 the heterozygote (CT) and 2 the mutant homozygote (TT).

Thanks for the help.

Upvotes: 1

Views: 164

Answers (1)

piRSquared
piRSquared

Reputation: 294288

df.filter(
    like='NA', axis=0
).eq(df.loc['SNPalleles'].str.replace('/', '')).astype(int)

             rs12914615
rs#                    
NA06985               1
NA06991               1
NA06993               1
NA06993.dup           0
NA06994               0
NA07000               0
NA07019               1
NA07022               1

Upvotes: 1

Related Questions