Reputation: 1
I am trying to make a function that counts de caracteres between "a" "t" "g" and "t" "a" "g" or "t" "g" "a"or "t" "a" "a" inside of a vector. But my code gets stuck in the while loop. An example would be like x = "a" "a" "a" "t" "a" "t" "g" "t" "c" "g" "t " "t " "t" "t" "a" "g". In this example the code should count 6 characters between "a" "t" "g" and "t" "a" "g". Any help will be appreciated :) .
orfs<-function(x,p){
count<-0
cntorfs<-0
n<-length(x)
v<-n-2
for (i in 1:v){
if(x[i]=="a"&& x[i+1]=="t"&& x[i+2]=="g"){
k<-i+3;
w<-x[k]
y<-x[k+1]
z<-x[k+2]
while (((w!="t")&&(y!="a")&& (z!="g"))||((w!="t")&&(y!="a")&&(z!="a"))||((w!="t")&&(y!="g")&& (z!="a"))||(i+2>v)){
count<-count+1
k<-k+1
w<-x[k]
y<-x[k+1]
z<-x[k+2]
}
}
if(count>p){
cntorfs<-cntorfs+1
}
if (count!=0){
count<-0
}
}
cat("orf:",cntorfs)
}
Upvotes: 0
Views: 253
Reputation: 50668
This is a very inefficient and un-R-like way to count the number of characters between two patterns.
Here is an alternative using gsub
that should get you started and can be extended to account for the other stop codons:
x <- c("a", "a", "a", "t", "a", "t", "g", "t", "c", "g", "t", "t", "t", "t", "a", "g")
nchar(gsub("[actg]*atg([actg]*)tag[actg]*", "\\1", paste0(x, collapse = "")))
#[1] 6
A more robust and general approach can be found here making use of Biostrings::matchPattern
. I would strongly advise against reinventing the wheel here, and instead recommend using some of the standard Bioconductor packages that were developed for exactly these kind of tasks.
Upvotes: 1