user1294476
user1294476

Reputation: 103

String manipulation in R

I have a list of DNA sequences, for example, "AGAACCTTATTGGGTCAAGT". If I were wanting to create a list with all possible strings that could sequentially happen in the sequence of a given length (for example 4) how would this be done in R?

In this case, the first string would be "AGAA". The second would be "GAAC", the third, "AACC", and so forth.

Upvotes: 1

Views: 59

Answers (1)

d.b
d.b

Reputation: 32558

x = "AGAACCTTATTGGGTCAAGT"
sapply(1:(nchar(x)-3), function(i) substr(x, i, i+3))
#[1] "AGAA" "GAAC" "AACC" "ACCT" "CCTT" "CTTA" "TTAT" "TATT" "ATTG" "TTGG" "TGGG" "GGGT" "GGTC" "GTCA" "TCAA" "CAAG" "AAGT"

Upvotes: 3

Related Questions