Greg Ruhl
Greg Ruhl

Reputation: 1114

Ruby regex - using optional named backreferences

I am trying to write a Ruby regex that will return a set of named matches. If the first element (defined by slashes) is found anywhere later in the string then I want the match to return that 2nd match onward. Otherwise, return the whole string. The closest I've gotten is (?<p1>top_\w+).*?(?<hier>\k<p1>.*) which doesn't work for the 3rd item. I've tried regex ifthen-else constructs but Rubular says it's invalid. I've tried (?<p1>[\w\/]+?)(?<hier>\k<p1>.*) which correct splits the 1st and 4th lines but doesn't work for the others. Please note: I want all results to return as the same named reference so I can iterate through "hier".

Input:

top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
top_ab12/hat[1]/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
top_bat/car[0]
top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog

Output:

hier = top_cat/mouse/dog/elephant/horse
hier = top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
hier = top_bat/car[0]
hier = top_2/top_1/top_3/top_4/dog

Upvotes: 1

Views: 243

Answers (2)

Vasili Syrakis
Vasili Syrakis

Reputation: 9601

Problem

The reason it does not match the second line is because the second instance of hat does not end with a slash, but the first instance does.

Solution

Specify that there is a slash between the first and second match

Regex

(top_.*)/(\1.*$)|(^.*$)

Replacement

hier = \2\3

Example

Regex101 Permalink


More info on the Alternation token

To explain how the | token works in regex, see the example: abc|def
What this regex means in plain english is:

  • Match either the regex below (attempting the next alternative only if this one fails)
    • Match the characters abc literally
  • Or match the regex below (the entire match attempt fails if this one fails to match)
    • Match the characters def literally

Example
Regex: alpha|alphabet
If we had a phrase "I know the alphabet", only the word alpha would be matched.
However, if we changed the regex to alphabet|alpha, we would match alphabet.

So you can see, alternation works in a left-to-right fashion.

Upvotes: 1

Matt
Matt

Reputation: 20786

paths = %w(
  top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
  top_ab12/hat/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
  top_bat/car[0]
  top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
  test/test
)

paths.each do |path|
  md = path.match(/^([^\/]*).*\/(\1(\/.*|$))/)
  heir = md ? md[2] : path
  puts heir
end

Output:

top_cat/mouse/dog/elephant/horse
top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
top_bat/car[0]
top_2/top_1/top_3/top_4/dog
test

Upvotes: 1

Related Questions