Reputation: 5088
This regex: (.*?)(?:I[0-9]-)*I3(?:-I[0-9])*
matches an expression using multiple groups. The point of the regex is that it captures patterns in pairs of two, where the first part of the regex has to be followed by the second part of the regex.
How can I extract each of these two groups?
library(stringr)
data <- c("A-B-C-I1-I2-D-E-F-I1-I3-D-D-D-D-I1-I1-I2-I1-I1-I3-I3-I7")
str_extract_all(data, "(.*?)(?:I[0-9]-)*I3(?:-I[0-9])*")
Gives me:
[[1]]
[1] "A-B-C-I1-I2-D-E-F-I1-I3" "-D-D-D-D-I1-I1-I2-I1-I1-I3-I3-I7"
However, I would want something along the lines of:
[[1]]
[1] "A-B-C-I1-I2-D-E-F" [2] "I1-I3"
[[2]]
[1] "D-D-D-D" [2] "I1-I1-I2-I1-I1-I3-I3-I7"
The key here is that regex matches twice, each time containing 2 groups. I want every match to have a list of it's own, and that list to contain 2 elements, one for each group.
Upvotes: 1
Views: 373
Reputation: 70732
You need to wrap a capturing group around the second part of your expression and if you're using stringr for this task, I would use str_match_all
instead to return the captured matches ...
library(stringr)
data <- c('A-B-C-I1-I2-D-E-F-I1-I3-D-D-D-D-I1-I1-I2-I1-I1-I3-I3-I7')
mat <- str_match_all(data, '-?(.*?)-((?:I[0-9]-)*I3(?:-I[0-9])*)')[[1]][,2:3]
colnames(mat) <- c('Group 1', 'Group 2')
# Group 1 Group 2
# [1,] "A-B-C-I1-I2-D-E-F" "I1-I3"
# [2,] "D-D-D-D" "I1-I1-I2-I1-I1-I3-I3-I7"
Upvotes: 1