Jay
Jay

Reputation: 1

How would I go about finding a specific sequence of letters within a dataset that has a list of letters?

I'm very new to R so sorry if this is a simple question. I basically have a dataset that has a list of letters. How would I go about finding if a specific sequence of letters such as "agtggt" exists, and if it does, how many of them exist?

I tried to do something with the ifelse function:

ifelse("a" %in% chain,"yes","no" )

My approach was to basically tell R to find "a" and that if it does, find "g", and that if it does, find "t", etc. Is this approach correct?

Upvotes: 0

Views: 57

Answers (2)

J. Ring
J. Ring

Reputation: 333

In addition to answer that @ThomasIsCoding provided, you could also create your own function for such searching purpose as follows:

# vector sample
chain <- c('a', 'a', 'a', 'g', 't', 'c', 'a', 't', 't')
# define function
string_finder <- function(sequence_string, target_variable) {
  # get sequence_string length
  len_seq <- nchar(sequence_string)
  # get reference for last iteration
  iter_end <- length(target_variable) - len_seq + 1
  # set occurances variable
  occurances <- 0
  # loop through vector
  i <- 1
  while (i < iter_end) {
    # if match found -> continue looking only after match
    if (sequence_string == paste0(target_variable[i:(i+len_seq-1)], collapse = '')) {
      occurances = occurances+1
      i = i+len_seq
    } else {
      # else continue with the next position
      i = i+1
    }
  }
  return(occurances)
}
# example
string_finder("agt", chain)

This way you would avoid working with regular expressions.

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 102529

Perhaps you can try grepl

c("no","yes")[1+grepl("agtggt",chain)]

If you want to know how many "agtggt" exists in chain, you can try

length(regmatches(chain,gregexpr("agtggt",chain))[[1]])

Upvotes: 2

Related Questions