f.rodrigues
f.rodrigues

Reputation: 3587

Regex match pattern not value

String:

a = 10
b = 50
c = a + b

Regex:

([a-z]) = (\d+)|([a-z]) = ([a-z]) \+ ([a-z])

I want to match the first group pattern to the last 3 groups instead of it's value to avoid repeating it all over.

Something like

([a-z]) = (\d+)|\1 = \1 \+ \1

But instead of \1 evaluation to 'a' I want to see if is the same pattern.

Upvotes: 2

Views: 589

Answers (2)

Jongware
Jongware

Reputation: 22478

If your GREP dialect supports it: use a Named conditional construction.

(?(<name>)then|else) where name is the name of a capturing group and then and else are any valid regexes
(http://www.regular-expressions.info/refadv.html).

The following regex initially matches either an initial lowercase or a set of digits. The match gets stored into the local capturing group #2 (lowercase) or #3 (digits). Then, the conditional instruction ?(2) tests if group #2 matched anything. If so, the first half of the rest of the regex is tested, if not, the second half is.

\l = ((\l)|(\d+))(?(2) \+ \l| \+ \d+)

On a short test list

a = 10 + 15
b = 50 + b
c = a + b

this will match the first and third line but not the second.

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336328

Some regex engines (for example PHP's PCRE engine, Perl and Ruby) support subroutines:

preg_match('/([a-z]) = (\d+)|((?1)) = ((?1)) \+ ((?1))/', $subject)

Note that in order to keep capturing the contents of those subroutines, you need an extra set of parentheses. So (?1) acts as a "placeholder" for [a-z], and ((?1)) captures that in a new capturing group.

If your language's regex engine doesn't, you may still be able to use string manipulation to implement subpatterns, though. For example, in Python:

>>> import re
>>> letter = "([a-z])"
>>> regex = re.compile(r"{0} = (\d+)|({0}) = ({0}) \+ ({0})".format(letter))

Upvotes: 1

Related Questions