Mike Viens
Mike Viens

Reputation: 2517

Regex to exclude matches based on pattern

I am trying to create a regex (Perl-compatible, but not Perl itself) that matches the following criteria:

The regex I have come up with so far is:

^(.(?!\b(?:r)\d*\b))*$

Below is a table of examples. Some are working, some are failing.

For the input strings below:

Results

+-------------------------------+---------------+--------------+
|         Input string          | Desired Match | Actual Match |
+-------------------------------+---------------+--------------+
| Some text                     | yes           | yes          |
| Some textr1                   | yes           | yes          |
| Some text default(r3)         | yes           | NO           |
| Some text default(abc r3)     | yes           | NO           |
| Some text default(r3 xyz)     | yes           | NO           |
| Some text default(abc r3 xyz) | yes           | NO           |
| Some text r12 default(r3)     | no            | no           |
| Some text r1                  | no            | no           |
| Some r1 text                  | no            | no           |
| \sR12 Some text               | no            | no           |
| Some text r1 somethingElse    | no            | no           |
| R1                            | no            | YES          |
| \s\sR2                        | no            | no           |
| R3\s\s                        | no            | YES          |
| \tr4                          | no            | no           |
| \t\sR5\t                      | no            | no           |
+-------------------------------+---------------+--------------+

Can anyone provide a working regex?

Mike V.

Upvotes: 2

Views: 145

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89639

You can use this pattern:

(?i)^(?>[^r(]++|(?<!\\[ts])\Br|r(?![0-9])|(\((?>[^()]++|(?1))*\))|\()++$

Pattern details:

(?i)                  # modifier: case insensitive
^                     # anchor: begining of the string
(?>                   # open an atomic group
    [^r(]++           # all characters except r and opening parenthesis
  |                   # OR
    (?<!\\[ts])\Br    # r without word boundary and not preceded by \t or \s
  |                   # OR
    r(?![0-9])        # r (with word boundary or preceded by \t or \s) not followed by a digit
  |                   # OR
    (                 # (nested or not parenthesis): open the capture group n°1
        \(            # literal: (
        (?>           # open an atomic group
            [^()]++   # all characters except parenthesis
          |           # OR
            (?1)      # (recursion): repeat the subpattern of the capture group n°1
        )*            # repeat the atomic group (the last) zero or more times
        \)            # literal: )
    )                 # close the first capturing group
  |                   # OR
    \(                # for possible isolated opening parenthesis
)++                   # repeat the first atomic group one or more times
$                     # anchor: end of the string

Note: if in your post \t and \s are not literals, you can remove (?<!\\[ts]).

Upvotes: 4

Related Questions