Reputation: 1
I'm very new to R so sorry if this is a simple question. I basically have a dataset that has a list of letters. How would I go about finding if a specific sequence of letters such as "agtggt" exists, and if it does, how many of them exist?
I tried to do something with the ifelse function:
ifelse("a" %in% chain,"yes","no" )
My approach was to basically tell R to find "a" and that if it does, find "g", and that if it does, find "t", etc. Is this approach correct?
Upvotes: 0
Views: 57
Reputation: 333
In addition to answer that @ThomasIsCoding provided, you could also create your own function for such searching purpose as follows:
# vector sample
chain <- c('a', 'a', 'a', 'g', 't', 'c', 'a', 't', 't')
# define function
string_finder <- function(sequence_string, target_variable) {
# get sequence_string length
len_seq <- nchar(sequence_string)
# get reference for last iteration
iter_end <- length(target_variable) - len_seq + 1
# set occurances variable
occurances <- 0
# loop through vector
i <- 1
while (i < iter_end) {
# if match found -> continue looking only after match
if (sequence_string == paste0(target_variable[i:(i+len_seq-1)], collapse = '')) {
occurances = occurances+1
i = i+len_seq
} else {
# else continue with the next position
i = i+1
}
}
return(occurances)
}
# example
string_finder("agt", chain)
This way you would avoid working with regular expressions.
Upvotes: 1
Reputation: 102529
Perhaps you can try grepl
c("no","yes")[1+grepl("agtggt",chain)]
If you want to know how many "agtggt"
exists in chain
, you can try
length(regmatches(chain,gregexpr("agtggt",chain))[[1]])
Upvotes: 2