Vitruvie
Vitruvie

Reputation: 2327

How do you reference repeated nested capture groups?

I'm writing Java code, and recently delegated some of the methods in this list-of-actions class to a BasicActions class, and now want to update all of the methods in this class to instead refer to the BasicActions methods. The methods I'm updating and the methods I want them to refer to have the same names and arguments, and I'm trying to use regex to rewrite the code, but I can't figure out how to handle the method arguments, of which there could be any number, and where I can't simply copy the group-of-groups because I need to remove keywords from it.

Example input:
public void jumpTo(final double x, double y) {
    /*arbitrary code,
    possibly spanning multiple lines*/
}
Desired output:
public void jumpTo(double x, double y) {
    addAction(BasicActions.jumpTo(x, y));
}
Almost-correct solution:
pattern: (public void ([a-zA-Z]*)\(((final )?([a-zA-Z]+) ([a-zA-Z]+(, )?))*\) \{\n *)((.*\n)*?)(    })
replacement: $1addAction(BasicActions.$2($6));\n$10
Almost-correct output: (doesn't remove unnecessary 'final' keywords, only captures the final argument)
public void jumpTo(final double x, double y) {
    addAction(BasicActions.jumpTo(y));
}

See the almost-solution in action at https://regex101.com/r/uE7aA1/1

My problem is that I because I can't include the type keyword (double in this case), I have to split out the variable names, which are then captured multiple times. How can I access the multiple captures, or otherwise reformat the multiple arguments as they are copied?

Upvotes: 1

Views: 1558

Answers (2)

Vitruvie
Vitruvie

Reputation: 2327

It is impossible to reference repeated capture groups other than the final in sequence; therefore if you want to modify each capture group, as in this situation, you must apply multiple regexes in sequence:

Step 1: Copy arguments list into position (https://regex101.com/r/uE7aA1/2)

pattern: (public void (\w+\((?:(?:final )?\w+ \w+(?:, )?)*\))) \{(?:.|\n)*?\n    \}
replacement: $1 {\n        addAction(BasicActions.$2);\n    }
output:
public void jumpTo(final double x, double y) {
    addAction(BasicActions.jumpTo(final double x, double y));
}

Step 2: Remove final

pattern:final #note the space
replacement:
output:
public void jumpTo(double x, double y) {
    addAction(BasicActions.jumpTo(double x, double y));
}

Step 3: Remove type keywords (https://regex101.com/r/kC0nA3/3)

use lookahead to match any argument without passing over other arguments
pattern: \w+ (\w+)(?=(, \w+ \w+)*\)\);\n    })
replacement: $1
output:
public void jumpTo(double x, double y) {
    addAction(BasicActions.jumpTo(x, y));
}

Upvotes: 0

Pierre
Pierre

Reputation: 570

When a capturing group is repeated, only the last item is captured . with the regular expression

public void ([a-zA-Z]*)\((?:(?:final )?[a-zA-Z]+ ([a-zA-Z]+(?:, )?))*\)  

where (?: ) is used to avoid unnecessary capturing complexity and numbering from hell, $0 is the whole, $1 is "jumpTo" and $2 is "y". Unfortunately, x cannot be captured that way.

you might need to explode the regex, by repeating the parameter matching multiple times, I do it 3 times here (you might need more),

public void ([a-zA-Z]*)((?:(?:final )?[a-zA-Z]+ ([a-zA-Z]+(?:, )?))(?:(?:final )?[a-zA-Z]+ ([a-zA-Z]+(?:, )?))?(?:(?:final )?[a-zA-Z]+ ([a-zA-Z]+(?:, )?))?)

$1 is "Jumpto", $2 is "x,", $3 is "y" and $4 is empty ""

Numbering is simple because non-capturing groups are not counted.

/*arbitrary code */ can have a simpler matching non-capturing rule with

\{(?:.|\n)*?\n    \} 

and the replacement rule

{\n    addAction(BasicActions.$1($2$3$4$5$6$7));\n    }\n

The final regexp up to 6 parameters would be (split over multiple lines

(?x: header match starts from here)
(public void ([a-zA-Z]*)\(
(?:(?:final )?[a-zA-Z]+ ([a-zA-Z]+(?:, )?))?(?x: param 1)
(?:(?:final )?[a-zA-Z]+ ([a-zA-Z]+(?:, )?))?(?x: param 2)
(?:(?:final )?[a-zA-Z]+ ([a-zA-Z]+(?:, )?))?(?x: param 3)
(?:(?:final )?[a-zA-Z]+ ([a-zA-Z]+(?:, )?))?(?x: param 4)
(?:(?:final )?[a-zA-Z]+ ([a-zA-Z]+(?:, )?))?(?x: param 5)
(?:(?:final )?[a-zA-Z]+ ([a-zA-Z]+(?:, )?))?(?x: param 6)
\))
(?x: match body starts here) \{(.|\n)*?\n    \}
()()()()()()(?x: for missing params 1-6)

where

$1 is the original function prototype

$2 is the name taken from the function name

$3 is the first parameter (or an empty string)

$4 is the second parameter (or an empty string)

$5 is the third parameter (or an empty string)

.....

and it is easy to extend to 4,5,6,7, ... parameters, with a much longer regular expression and no problem in counting the capturing groups. The last ()().. make sure that the capturing groups are empty regardless of the number of parameters (this depends on regex engine implementation). Some regex engines might not like empty (), but very few detect this voluntary match of an empty string

((?x:))

a regexp comment into a capturing group.

(edited many times because of typos and look, (.|\n) is a smiley for half blind man reading a tortuous regular expression)

Upvotes: 2

Related Questions