Reputation: 109
This type of question is already asked many times, however I could not get the answer according to my needs.
I know some way of splitting strings in R
. If I have a string x <- "AGCAGT"
, and want to split the string into characters of three. I would do this by
substring(x, seq(1, nchar(x)-1, 3), seq(3, nchar(x), 3))
and string of two character much faster by
split <- strsplit(x, "")[[1]]
substrg <- paste0(split[c(TRUE, FALSE)], split[c(FALSE, TRUE)])
As a new user of R
, I feel difficulty to split string according to my requirements. If x <- "AGCTG"
and if I use substring(x, seq(1, nchar(x)-1, 3), seq(3, nchar(x), 3))
, I do not get the last two character substring. I get
"AGC" ""
However I am interested to get something like
"AGC" "TG"
or if I have x <- "AGCT"
and splitting 3 characters at a time, I would like to get some thing like
"AGC" "T"`
I short, how to split a string into substrings of desired length (2,3,4,5...n), and also retaining those remaining characters less than the desired length.
Upvotes: 0
Views: 262
Reputation: 109
Answer by zx8754. But unfortunately he deleted the answer after some marked the question as duplicate. If he would like to post this as an answer, I'l delete my post.
x <- "AGCGGCCAGCTGCCTGAA"
mylen <- 5
ss <- strsplit(x, "")[[1]]
sapply(split(ss, ceiling(seq_along(ss)/mylen)), paste, collapse = "")
Upvotes: 1
Reputation: 19970
Here is one possible solution in a few simple steps.
x <- "AGCGGCCAGCTGCCTGAA"
# desired length
mylen = 5
# start indices
start <- seq(1, nchar(x), mylen)
# end indicies
end <- pmin(start + mylen - 1, nchar(x))
substring(x, start, end)
[1] "AGCGG" "CCAGC" "TGCCT" "GAA"
Upvotes: 1