Reputation: 705
Given the string:
string <- "AATTGGCGCTAG---AT-TTACG----"
How can I split it into strings based on the occurrence of one or more "-". For example:
string1 <- "AATTGGCGCTAG"
string2 <- "---"
string3 <- "AT"
string4 <- "-"
string5 <- "TTACG"
string6 <- "----"
I have tried:
strsplit(string, "[-]+")
However, this does not return the stings of "-"
Upvotes: 3
Views: 302
Reputation: 626758
You may match them with
[^-]+|-+
See the regex demo. It matches
[^-]+
- 1+ chars other than -
|
- or-+
- 1 or more -
chars.In R:
x <- "AATTGGCGCTAG---AT-TTACG----"
regmatches(x, gregexpr("[^-]+|-+", x))
Or
library(stringr)
x <- "AATTGGCGCTAG---AT-TTACG----"
str_extract_all(x, "[^-]+|-+")
Output
## => [[1]]
## [1] "AATTGGCGCTAG" "---" "AT" "-" "TTACG" "----"
Upvotes: 4
Reputation: 521103
Here is a direct fix to your current attempt with strsplit
:
string <- "AATTGGCGCTAG---AT-TTACG----"
strsplit(string, "(?<=[^-])(?=[-])|(?<=[-])(?=[^-])", perl=TRUE)[[1]]
[1] "AATTGGCGCTAG" "---" "AT" "-" "TTACG"
[6] "----"
The idea behind the regex pattern is to split whenver one of the following two conditions be true:
Upvotes: 0