Reputation: 13
I wanna loop through a sequence of letters 'ABCDEFGHIJK', but the loop in R loops over 1 value at a time. Is there a way to loop over 3 values at a time? In this case the sequence 'ABCDEFGHIJK' would be looped as 'ABC' then 'DEF' and so on.
I've tried to change the length of the function but I still didn't find a way, I can do this in python but I didn't find any information about it in R nor in the help option of R.
xp <-'ACTGCT'
for(i in 1:length(xp)){
if(i == 'ACG'){
print('T')
}
}
Upvotes: 1
Views: 140
Reputation: 269644
1) Base R Iterate over the sequence 1, 4, 7, ... and use substr
to extract the 3 character portion of the input string starting at that position number. Then perform whatever processing that is desired. If there are fewer than 3 characters in the last chunk it will use whatever is available for that chunk. This is a particularly good approach if you want to exit early since a break
can be inserted into the loop.
for(i in seq(1, nchar(xp), 3)) {
s <- substr(xp, i, i+2)
print(s) # replace with desired processing
}
## [1] "ACT"
## [1] "GCT"
1a) lapply We translate the loop to lapply
or sapply
if one iteration does not depend on another.
process <- function(i) {
s <- substr(xp, i, i+2)
s # replace with desired processing
}
sapply(seq(1, nchar(xp), 3), process)
## [1] "ACT" "GCT"
2) rollapply Another possibility is to break the string up into single characters and then iterate over those passing a 3 element vector of single characters to the indicated function. Here we have used toString
to process each chunk but that can be replaced with any other suitable function.
library(zoo)
rollapply(strsplit(xp, "")[[1]], 3, by = 3, toString, align = "left", partial = TRUE)
## [1] "A, C, T" "G, C, T"
Upvotes: 0
Reputation: 5138
Here is a stringr
solution that outputs a list for whether or not there are matches:
library(stringr)
# Split string into sequences of 3 (or fewer if length is not multiple of 3)
split_strings <- str_extract_all("ABCDEFGHIJK", ".{1,3}", simplify = T)[1,]
# The strings you want to loop through / search for
x <- c("ABC", "DEF", "GHI", "LMN")
# Output is named list
sapply(x, `%in%`, split_strings, simplify = F)
$ABC
[1] TRUE
$DEF
[1] TRUE
$GHI
[1] TRUE
$LMN
[1] FALSE
Or, if you only want to look for one element:
"ABC" %in% split_strings
[1] TRUE
Upvotes: 2
Reputation: 887128
An option would be to split the string over each 3 characters and then do the comparison
lapply(strsplit(v1, "(?<=.{3})", perl = TRUE), function(x) x== 'ACG')
#[[1]]
#[1] FALSE FALSE FALSE FALSE
v1 <- 'ABCDEFGHIJK'
Upvotes: 2
Reputation: 51592
We can use the vectorized substring
, i.e.
substring('ABCDEFGHIJK', seq(1, nchar('ABCDEFGHIJK') - 1, 3), seq(3, nchar('ABCDEFGHIJK'), 3)) == 'ACG'
#[1] FALSE FALSE FALSE FALSE
NOTE: This will only extract 3-characters. So If at the end you are left with 2 characters, it will not return them. For the above example, it outputs:
substring('ABCDEFGHIJK', seq(1, nchar('ABCDEFGHIJK') - 1, 3), seq(3, nchar('ABCDEFGHIJK'), 3))
#[1] "ABC" "DEF" "GHI" ""
Upvotes: 2