Reputation: 775
I have a string like this:
s <- "aaehhhhhhhaannd"
How can I split the string into the following format with R?
c("aa", "e", "hhhhhhh", "aa","nn","d")
Upvotes: 3
Views: 126
Reputation: 626728
You may use a base R strsplit
with a PCRE regex based on lookarounds.
s <- "aaehhhhhhhaannd"
strsplit(s, "(?<=(.))(?!\\1)", perl=TRUE)
# [[1]]
# [1] "aa" "e" "hhhhhhh" "aa" "nn" "d"
See the R demo online and a regex demo.
Regex details:
(?<=(.))
- a positive lookbehind ((?<=...)
) that "looks" left and captures any char into Group 1 with (.)
capturing group (this value can be referred to from inside the pattern with the help of a \1
backreference)(?!\\1)
- a negative lookahead that fails the match if there is the same value as captured into Group 1 immediately to the right of the current location.Since the lookarounds are not consuming text, the split occurs at the location between different characters.
NOTE: If you want .
to match a newline, too, add (?s)
at the start of the pattern (as in PCRE regex, .
does not match line breaks by default).
Upvotes: 3
Reputation: 214917
You can use str_extract_all
, with regex (.)\\1*
which uses back reference to match repetitive characters:
library(stringr)
str_extract_all("aaehhhhhhhaannd", "(.)\\1*")
#[[1]]
#[1] "aa" "e" "hhhhhhh" "aa" "nn" "d"
Upvotes: 3