Reputation: 5029
Let's say that I need to extract different parts from a string as list, for example I would like to divide the string "aaa12xxx"
in three parts.
One possibility is to do three gsub
calls:
parts = c()
parts[1] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\1', "aaa12xxx")
parts[2] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\2', "aaa12xxx")
parts[3] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\3', "aaa12xxx")
Of course this seems quite a waste (even if it's inside a for
loop). Isn't there a function that simply returns the list of parts from a regex and a test string?
Upvotes: 3
Views: 924
Reputation: 174696
Just split the input string through strsplit
and get the parts you want..
> x <- "aaa12xxx"
> strsplit(x,"(?<=[[:alpha:]])(?=\\d)|(?<=\\d)(?=[[:alpha:]])", perl=TRUE)
[[1]]
[1] "aaa" "12" "xxx"
Get the parts by specifying the index number..
> m <- unlist(strsplit(x,"(?<=[[:alpha:]])(?=\\d)|(?<=\\d)(?=[[:alpha:]])", perl=TRUE))
> m[1]
[1] "aaa"
> m[2]
[1] "12"
> m[3]
[1] "xxx"
(?<=[[:alpha:]])(?=\\d)
Matches all the boundaries which are preceded by an alphabet and followed by a digit.
|
OR
(?<=\\d)(?=[[:alpha:]])
Matches all the boundaries which are preceded by a digit and followed by an alphabet.
Splitting your input according to the matched boundaries will give you the desired output.
Upvotes: 4
Reputation: 67968
(\\d+)|([a-zA-Z]+)
or
([[:alpha:]]+)|([0-9]+)
You can just grab the capture.use str_match_all()
from library(stringr)
.See demo.
https://regex101.com/r/fA6wE2/8
Upvotes: 3