neradis
neradis

Reputation: 273

RegExp (PCRE or Emacs): Repetition of previously defined group

Is there a syntax for RegExps that allows to repeat a group definition that appeared earlier in the same RexExp. Please note: I want to 'copy' the group definition again, I am not interested in backreference to the match of a previous group (i.e. "\n" is not what I am looking for).

For example: I look for a RegExp that will match "spamniceggs", "eggswithspam", "spamlovelyspam", "eggeggspam", but neither "spamwithham" nor "deliciousegg".

A possible PCRE RegExp would be:((?:spam)|(?:egg))\w*((?:egg)|(?:spam)) In this case and similar cases it would be nice to avoid explicit repetition of an identical group description (DRY). So I am looking for a hypothetical operator "~n" with a semantic as follows: Apply reapply the same group description as for the n-th capturing group. Thus the example RegExp could then be expressed as: (?:(?:spam)|(?:egg))\w*~1

Is there any way to achieve something along this lines?

Upvotes: 2

Views: 386

Answers (2)

tripleee
tripleee

Reputation: 189327

There is no facility for anything like this in either of the regex implementations you are asking about Emacs, but the surrounding language makes it easy enough. In Lisp:

(let* (s "spam")
      (e "egg")
      (sore (concat "\\(" s "\\|" e "\\)"))
      (regex (concat sore "[A-Za-z]*" sore)) )
  (... do stuff with regex ...)

In C, you can similarly build the regex in a string with e.g. sprintf.

Edit: Had overlooked ?(DEFINE) in PCRE. I'm leaving this in for the Emacs / general case.

Upvotes: 5

Alexandr Ciornii
Alexandr Ciornii

Reputation: 7394

If you mean something like qr// in Perl, PCRE does not has it, use ?(DEFINE) and (?&). They are features copied from Perl 5.10 into PCRE. Example for IP address:

(?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
         \b (?&byte) (\.(?&byte)){3} \b

Upvotes: 4

Related Questions