Reputation: 5719
I have a problem solving this in R. I have this data frame called testa (dput included). I need to match all the letters in column ALT
with the colnames (A,C,G,T,N)
and get the corresponding values in those column along with the value for REF
letters and get the result ad.new
(my code does this job).
However, I need to expand this code to solve an issue with the line where the TYPE
column has flat
at the end. For the row with the flat
, I need to match its start id (chr10:102053031
) with other ids in start column. If they match, I need to sum up the corresponding value for ALT
from A,C,G,T,N
column and replace it with ad.new column for the flat line along with the REF
value.
If you run the dput
and my code you will be able to understand it. So basically, I want to match the letters in REF
and ALT
columns and get the corresponding values from the columns (A,C,G,T,N
) and separate those values by comma for REF
and ALT
. However (in this example), for flat
line I want to sum up the value in column A
with matching start id with the start id of flat
line (the value in this case is 6
) and the value with another match (the value in this case is 7
from G
column) and sum them together to give 13
. So for flat line my result should be 0,13
.
The expected result is also shown below.
my incomplete code:
testa[is.na(testa)]<-0
ref.counts<-testa[,testa[,"REF"]]
ref.counts<-as.matrix(Ref.counts)
ref.counts[is.na(Ref.counts)]<-0
ref.counts<-diag(Ref.counts)
alt.counts<-testa[,testa[,"ALT"]]
alt.counts<-as.matrix(alt.counts)
alt.counts[is.na(alt.counts)]<-0
alt.counts<-diag(alt.counts)
#############
##need to extend this code here
#############
ad.new<-paste(Ref.counts,alt.counts,sep=",")
dput for testa:
structure(c("chr10:101544447", "chr10:102053031", "chr10:102778767",
"chr10:102789831", "chr10:102989480", "chr10:102053031", "chr10:102053031",
"0", "6", "0", "0", "0", "0", "0", "0", "34", "24", "0", "0",
"34", "34", "0", "0", "0", "0", "0", "0", "7", "53", "0", "0",
"30", "12", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"chr10", "chr10", "chr10", "chr10", "chr10", "chr10", "chr10",
"101544447", "102053031", "102778767", "102789831", "102989480",
"102053031", "102053031", "A", "C", "C", "C", "C", "C", "C",
"T", "A", "T", "T", "T", "G", "G", "snp", "snp", "snp", "snp",
"snp", "snp:102053031:flat", "snp", "nonsynonymous SNV",
"intronic", "nonsynonymous SNV", "nonsynonymous SNV", "ncRNA_exonic",
"intronic", "intronic", "ABCC2:NM_000392:exon2:c.A116T:p.Y39F,",
"PKD2L1", "PDZD7:NM_024895:exon8:c.G1136A:p.R379Q,PDZD7:NM_001195263:exon8:c.G1136A:p.R379Q,",
"PDZD7:NM_024895:exon2:c.G146A:p.R49Q,PDZD7:NM_001195263:exon2:c.G146A:p.R49Q,",
"LBX1-AS1", "PKD2L1", "PKD2L1"), .Dim = c(7L, 15L), .Dimnames = list(
c("1", "2", "3", "4", "5", "6", "7"), c("start", "A", "C",
"G", "T", "N", "=", "-", "chr", "end", "REF", "ALT", "TYPE",
"refGene::location", "refGene::type")))
Expected result
ad.new
"0,53"
"34,6"
"24,0"
"0,30"
"0,12"
"0,13"
"34,7"
Upvotes: 0
Views: 93
Reputation: 57210
Something like this should work :
# apply the "normal" rule (non considering flat exceptions)
alts <- as.numeric(diag(testa[,testa[,"ALT"]]))
refs <- as.numeric(diag(testa[,testa[,"REF"]]))
res <- paste(refs,alts,sep=",")
# replace lines having TYPE ending with "flat"
flats <- grep('.*flat$',testa[,"TYPE"])
res[flats] <-
unlist(lapply(flats,function(x){
startId <- testa[x,"start"]
selection <- setdiff(which(testa[,"start"] == startId),r)
paste0("0,",sum(alts[selection]))
}))
ad.new <- as.matrix(res)
> ad.new
[,1]
[1,] "0,53"
[2,] "34,6"
[3,] "24,0"
[4,] "0,30"
[5,] "0,12"
[6,] "0,13"
[7,] "34,7"
Upvotes: 2