sdpl.cs
sdpl.cs

Reputation: 13

Struggling with negative lookahead in a regex

I humbly request for some guidance on using multiple negative lookaheads in a regex. I currently have a string that matches up against 2 regular expressions.

String: Armadale Joe Bloggs 22-333-222 20001 Whitfords to Butler

  1. ^Armadale\D+\d{2}-\d{3}-\d{2}\D+2\d{4}\D+$
  2. (Armadale|Fremantle|Butler|Mandurah|Midland|Thornlie)\D+(?![0-9]{2}-[0-9]{3}-[0-9]{2})2[0-9]{4}\D+$

How can I modify Regex 2 so that it doesn't match the string?

Shouldn't the negative lookahead (?![0-9]{2}-[0-9]{3}-[0-9]{2}) NOT match 22-333-22?

I would instead, like to have Regex 2 match 22-333-333, 333-333-22 or 22-22-22?

Any help would be highly appreciated.

Cheers,

Trav.

Upvotes: 0

Views: 84

Answers (2)

Dmitry Egorov
Dmitry Egorov

Reputation: 9650

Your explanation suggests your initial sample string should be "Armadale Joe Bloggs 22-333-22 20001 Whitfords to Butler". I.e. having only two digits in the third digit group. So the dash-separated digit series should be of the lengths 2-3-2.

Now, what you want in your new regex is filter off the 2-3-2 series but allow for others -- 2-3-3, 3-3-2, or 2-2-2, for example. And you was close by adding the (?![0-9]{2}-[0-9]{3}-[0-9]{2}) lookahead but missed the new digit series pattern. Here's a corrected version:

^(Armadale|Fremantle|Butler|Mandurah|Midland|Thornlie)\D+(?![0-9]{2}-[0-9]{3}-[0-9]{2}\D+)\d+-\d+-\d+\D+2[0-9]{4}\D+$

(Demo: https://regex101.com/r/vI0tY3/1)

In the regex above I used generic \d+-\d+-\d+ pattern which you may wish make more restrictive, perhaps.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

Your current regex is not working because you are checking for a XX-XXX-XX substring after current position in the regex, and the subsequent characters are 5 digits, which means the look-ahead will always be true. Remove the look-ahead and you will match Armadale Joe Bloggs 20001 Whitfords to Butler, not Armadale Joe Bloggs 22-333-222 20001 Whitfords to Butler (which your current regex does not match because you disallow the digit to appear before the 2):

(Armadale|Fremantle|Butler|Mandurah|Midland|Thornlie)[^\d\n]+2[0-9]{4}[^\d\n]+$

See demo 1

If you want to grab any XX(X)-XX(X)-XX(X) digit/hyphen sequences, use

\b[0-9]{2,3}(?:-[0-9]{2,3}){2}\b

See demo 2

You may combine the regexps to match both the strings with alternatives and the digit-hyphen sequences:

(Armadale|Fremantle|Butler|Mandurah|Midland|Thornlie)[^\d\n]+2[0-9]{4}[^\d\n]+$|\b[0-9]{2,3}(?:-[0-9]{2,3}){2}\b

See demo 3

Upvotes: 0

Related Questions