Yohan Obadia
Yohan Obadia

Reputation: 2672

Getting two substrings/groups before and after last nth "_"

Let's look at an example:

abc_def_ghi_jkl

If I choose n = 1, I want the output to be:

group1 = abc_def_ghi
group2 = jkl

If I choose n = 2, I want the output to be:

group1 = abc_def
group2 = ghi_jkl

Note: The _ that separated the two groups is removed.

For now I only figured out how to select the last group, but it also selects the _:

(?:.(?!(?=\_)))+$

Note 2: I am currently focusing on the regex part but it is a code to be used in R if it helps to get to a solution.

Upvotes: 0

Views: 36

Answers (1)

akuiper
akuiper

Reputation: 215057

A possibility to split on the nth occurrence of _ from the end of the string:

strsplit("abc_def_ghi_jkl", "_(?=([^_]*_){0}[^_]*$)", perl = T)
                                     #    ^
                                     #  you can modify the quantifier here
#[[1]]                                         
#[1] "abc_def_ghi" "jkl"                    # split on the 1st

strsplit("abc_def_ghi_jkl", "_(?=([^_]*_){1}[^_]*$)", perl = T)
#[[1]]
#[1] "abc_def" "ghi_jkl"                    # split on the 2nd

strsplit("abc_def_ghi_jkl", "_(?=([^_]*_){2}[^_]*$)", perl = T)
#[[1]]
#[1] "abc"         "def_ghi_jkl"            # split on the 3rd

_(?=([^_]*_){2}[^_]*$) looks for _ before the pattern ([^_]*_){2}[^_]*$ via ?= look ahead syntax and the pattern starts from the end of the string $ and skips any non _ patterns [^_]* and matches ([^_]*_) for certain number of occurrences and after that split on the specified _.

Update with str_match from stringr package:

str_match("abc_def_ghi_jkl", "(.*)_((?:[^_]*_){0}[^_]*$)")[,2:3]
# [1] "abc_def_ghi" "jkl"     

str_match("abc_def_ghi_jkl", "(.*)_((?:[^_]*_){1}[^_]*$)")[,2:3]
# [1] "abc_def" "ghi_jkl"

str_match("abc_def_ghi_jkl", "(.*)_((?:[^_]*_){2}[^_]*$)")[,2:3]
# [1] "abc"         "def_ghi_jkl"

Upvotes: 1

Related Questions