Peter Chung
Peter Chung

Reputation: 1122

R columns replace alphabet by other columns in a data frame

I have a data frame with three reference columns ref, het and hom in every row and I want to replace the alphabet/genotype in the columns which G=C, A=T, AG=TC or vice versa based on the reference columns.

structure(list(SNP = c("rs1", "rs2", "rs3", "rs4", "rs5", "rs6", 
"rs7", "rs8", "rs9"), ref = c("GG", "AA", "AA", "GG", "GG", "GG", 
"AA", "CC", "GG"), het = c("AG", "AG", "AG", "AG", "AG", "AG", 
"AG", "AC", "AG"), hom = c("AA", "GG", "GG", "AA", "AA", "AA", 
"GG", "AA", "AA"), A = c("TC", "TC", "CC", "AG", "TT", "TC", 
"AA", "GG", "GG"), B = c("CC", "TT", "CC", "AG", "TT", "CC", 
"AA", "TG", "GG"), C = c("CC", "CC", "CC", "GG", "CC", "TT", 
"AA", "TG", "GG"), D = c("TT", "TC", "CC", "AG", "TT", "TT", 
"AA", "GG", "AG"), E = c("CC", "TT", "CC", "AG", "TC", "TT", 
"AA", "TG", "GG"), F = c("TC", "TT", "TC", "GG", "TC", "TC", 
"AA", "GG", "GG"), G = c("TC", "TC", "CC", "AG", "TC", "TC", 
"AA", "GG", "GG"), H = c("TC", "TC", "TC", "GG", "TC", "TC", 
"AA", "TG", "GG")), .Names = c("SNP", "ref", "het", "hom", "A", 
"B", "C", "D", "E", "F", "G", "H"), class = "data.frame", row.names = 
c(NA, 
-9L))

Input:
SNP ref het hom A   B   C   D   E   F   G   H   I
rs1 GG  AG  AA  TC  CC  CC  TT  CC  TC  TC  TC  …
rs2 AA  AG  GG  TC  TT  CC  TC  TT  TT  TC  TC  …
rs3 AA  AG  GG  CC  CC  CC  CC  CC  TC  CC  TC  …
rs4 GG  AG  AA  AG  AG  GG  AG  AG  GG  AG  GG  …
rs5 GG  AG  AA  TT  TT  CC  TT  TC  TC  TC  TC  …
rs6 GG  AG  AA  TC  CC  TT  TT  TT  TC  TC  TC  …
rs7 AA  AG  GG  AA  AA  AA  AA  AA  AA  AA  AA  …
rs8 CC  AC  AA  GG  TG  TG  GG  TG  GG  GG  TG  …
rs9 GG  AG  AA  GG  GG  GG  AG  GG  GG  GG  GG  …

Desired Output:
SNP ref het hom A   B   C   D   E   F   G   H   I
rs1 GG  AG  AA  AG  GG  GG  AA  GG  AG  AG  AG  …
rs2 AA  AG  GG  AG  AA  GG  AG  AA  AA  AG  AG  …
rs3 AA  AG  GG  GG  GG  GG  GG  GG  AG  GG  AG  …
rs4 GG  AG  AA  AG  AG  GG  AG  AG  GG  AG  GG  …
rs5 GG  AG  AA  AA  AA  GG  AA  AG  AG  AG  AG  …
rs6 GG  AG  AA  AG  GG  AA  AA  AA  AG  AG  AG  …
rs7 AA  AG  GG  AA  AA  AA  AA  AA  AA  AA  AA  …
rs8 CC  AC  AA  AA  AC  AC  CC  AC  CC  CC  AC  …
rs9 GG  AG  AA  GG  GG  GG  AG  GG  GG  GG  GG  …

how can I write a function to replace these alphabet based on the reference columns? Thank you.

Upvotes: 2

Views: 74

Answers (2)

Bea
Bea

Reputation: 1110

We can create a "dictionary" with all the possible genotypes and their correspondence, than go through the list of SNPs, check the first element (column A). If it is not in ref/het/hom, than we suppose that the elements in that row need to be changed, otherwise we just return the row as it is.

key = list(AA="TT",TT="AA",
           GG="CC",CC="GG",
           AG="TC",TC="AG",
           GA="CT",CT="GA",
           AC="TG",TG="AC",
           CA="GT",GT="CA")


changeAlleles <- function(myrow) {
  if (!(myrow[5] %in% myrow[2:4])) {
    myrow <- c(myrow[1:4],sapply(myrow[5:length(myrow)], function(x) key[[x]]))
  }
  return(myrow)
} 

df2=as.data.frame(t(apply(df,1,changeAlleles)))

   SNP ref het hom  A  B  C  D  E  F  G  H
2  rs1  GG  AG  AA AG GG GG AA GG AG AG AG
3  rs2  AA  AG  GG AG AA GG AG AA AA AG AG
4  rs3  AA  AG  GG GG GG GG GG GG AG GG AG
5  rs4  GG  AG  AA AG AG GG AG AG GG AG GG
6  rs5  GG  AG  AA AA AA GG AA AG AG AG AG
7  rs6  GG  AG  AA AG GG AA AA AA AG AG AG
8  rs7  AA  AG  GG AA AA AA AA AA AA AA AA
9  rs8  CC  AC  AA CC AC AC CC AC CC CC AC
10 rs9  GG  AG  AA GG GG GG AG GG GG GG GG

Upvotes: 2

akrun
akrun

Reputation: 887193

We can use chartr

df1[5:12] <- lapply(df1[5:12], function(x) chartr('TC', 'AG', x))
df1
#  SNP ref het hom  A  B  C  D  E  F  G  H I
#1 rs1  GG  AG  AA AG GG GG AA GG AG AG AG …
#2 rs2  AA  AG  GG AG AA GG AG AA AA AG AG …
#3 rs3  AA  AG  GG GG GG GG GG GG AG GG AG …
#4 rs4  GG  AG  AA AG AG GG AG AG GG AG GG …
#5 rs5  GG  AG  AA AA AA GG AA AG AG AG AG …
#6 rs6  GG  AG  AA AG GG AA AA AA AG AG AG …
#7 rs7  AA  AG  GG AA AA AA AA AA AA AA AA …
#8 rs8  CC  AC  AA GG AG AG GG AG GG GG AG …
#9 rs9  GG  AG  AA GG GG GG AG GG GG GG GG …

Upvotes: 1

Related Questions