Reputation: 2672
Let's look at an example:
abc_def_ghi_jkl
If I choose n = 1
, I want the output to be:
group1 = abc_def_ghi
group2 = jkl
If I choose n = 2
, I want the output to be:
group1 = abc_def
group2 = ghi_jkl
Note: The _
that separated the two groups is removed.
For now I only figured out how to select the last group, but it also selects the _
:
(?:.(?!(?=\_)))+$
Note 2: I am currently focusing on the regex part but it is a code to be used in R if it helps to get to a solution.
Upvotes: 0
Views: 36
Reputation: 215057
A possibility to split on the nth occurrence of _
from the end of the string:
strsplit("abc_def_ghi_jkl", "_(?=([^_]*_){0}[^_]*$)", perl = T)
# ^
# you can modify the quantifier here
#[[1]]
#[1] "abc_def_ghi" "jkl" # split on the 1st
strsplit("abc_def_ghi_jkl", "_(?=([^_]*_){1}[^_]*$)", perl = T)
#[[1]]
#[1] "abc_def" "ghi_jkl" # split on the 2nd
strsplit("abc_def_ghi_jkl", "_(?=([^_]*_){2}[^_]*$)", perl = T)
#[[1]]
#[1] "abc" "def_ghi_jkl" # split on the 3rd
_(?=([^_]*_){2}[^_]*$)
looks for _
before the pattern ([^_]*_){2}[^_]*$
via ?=
look ahead syntax and the pattern starts from the end of the string $
and skips any non _
patterns [^_]*
and matches ([^_]*_)
for certain number of occurrences and after that split on the specified _
.
Update with str_match
from stringr
package:
str_match("abc_def_ghi_jkl", "(.*)_((?:[^_]*_){0}[^_]*$)")[,2:3]
# [1] "abc_def_ghi" "jkl"
str_match("abc_def_ghi_jkl", "(.*)_((?:[^_]*_){1}[^_]*$)")[,2:3]
# [1] "abc_def" "ghi_jkl"
str_match("abc_def_ghi_jkl", "(.*)_((?:[^_]*_){2}[^_]*$)")[,2:3]
# [1] "abc" "def_ghi_jkl"
Upvotes: 1