R code challenge: retrieving the values in matching columns and sum them up with matching rows

Question

I have a problem solving this in R. I have this data frame called testa (dput included). I need to match all the letters in column ALT with the colnames (A,C,G,T,N) and get the corresponding values in those column along with the value for REF letters and get the result ad.new (my code does this job).

However, I need to expand this code to solve an issue with the line where the TYPE column has flat at the end. For the row with the flat, I need to match its start id (chr10:102053031) with other ids in start column. If they match, I need to sum up the corresponding value for ALT from A,C,G,T,N column and replace it with ad.new column for the flat line along with the REF value.

If you run the dput and my code you will be able to understand it. So basically, I want to match the letters in REF and ALT columns and get the corresponding values from the columns (A,C,G,T,N) and separate those values by comma for REF and ALT. However (in this example), for flat line I want to sum up the value in column A with matching start id with the start id of flat line (the value in this case is 6) and the value with another match (the value in this case is 7 from G column) and sum them together to give 13. So for flat line my result should be 0,13.

The expected result is also shown below.

my incomplete code:

testa[is.na(testa)]<-0 
ref.counts<-testa[,testa[,"REF"]]
ref.counts<-as.matrix(Ref.counts) 
ref.counts[is.na(Ref.counts)]<-0
ref.counts<-diag(Ref.counts)

alt.counts<-testa[,testa[,"ALT"]]
alt.counts<-as.matrix(alt.counts)
alt.counts[is.na(alt.counts)]<-0
alt.counts<-diag(alt.counts)

#############
##need to extend this code here
#############
ad.new<-paste(Ref.counts,alt.counts,sep=",")

dput for testa:

structure(c("chr10:101544447", "chr10:102053031", "chr10:102778767", 
"chr10:102789831", "chr10:102989480", "chr10:102053031", "chr10:102053031", 
"0", "6", "0", "0", "0", "0", "0", "0", "34", "24", "0", "0", 
"34", "34", "0", "0", "0", "0", "0", "0", "7", "53", "0", "0", 
"30", "12", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", 
"chr10", "chr10", "chr10", "chr10", "chr10", "chr10", "chr10", 
"101544447", "102053031", "102778767", "102789831", "102989480", 
"102053031", "102053031", "A", "C", "C", "C", "C", "C", "C", 
"T", "A", "T", "T", "T", "G", "G", "snp", "snp", "snp", "snp", 
"snp", "snp:102053031:flat", "snp", "nonsynonymous SNV", 
"intronic", "nonsynonymous SNV", "nonsynonymous SNV", "ncRNA_exonic", 
"intronic", "intronic", "ABCC2:NM_000392:exon2:c.A116T:p.Y39F,", 
"PKD2L1", "PDZD7:NM_024895:exon8:c.G1136A:p.R379Q,PDZD7:NM_001195263:exon8:c.G1136A:p.R379Q,", 
"PDZD7:NM_024895:exon2:c.G146A:p.R49Q,PDZD7:NM_001195263:exon2:c.G146A:p.R49Q,", 
"LBX1-AS1", "PKD2L1", "PKD2L1"), .Dim = c(7L, 15L), .Dimnames = list(
    c("1", "2", "3", "4", "5", "6", "7"), c("start", "A", "C", 
    "G", "T", "N", "=", "-", "chr", "end", "REF", "ALT", "TYPE", 
    "refGene::location", "refGene::type")))

Expected result

 ad.new
"0,53"
"34,6"
"24,0"
"0,30"
"0,12"
"0,13" 
"34,7"

digEmAll · Accepted Answer

Something like this should work :

# apply the "normal" rule (non considering flat exceptions)
alts <- as.numeric(diag(testa[,testa[,"ALT"]]))
refs <- as.numeric(diag(testa[,testa[,"REF"]]))
res <- paste(refs,alts,sep=",")

# replace lines having TYPE ending with "flat"
flats <- grep('.*flat$',testa[,"TYPE"])
res[flats] <- 
unlist(lapply(flats,function(x){
                startId <- testa[x,"start"]
                selection <- setdiff(which(testa[,"start"] == startId),r)
                paste0("0,",sum(alts[selection]))
             }))

ad.new <- as.matrix(res)
> ad.new
     [,1]  
[1,] "0,53"
[2,] "34,6"
[3,] "24,0"
[4,] "0,30"
[5,] "0,12"
[6,] "0,13"
[7,] "34,7"

R code challenge: retrieving the values in matching columns and sum them up with matching rows

Answers (1)

Related Questions