Matt
Matt

Reputation: 3760

JavaScript regular expression with word boundary requirement

I'm writing a regex for a phone number that is part of a larger string and I'm having a bit of trouble with one particular requirement.

The basic requirements are to match on the following phone number styles - standard australian phone numbers with an area code or the international prefix and a little leeway for spaces and hyphens:

0395551234
03 9555 1234
03-9555-1234
+61395551234

However, I only want to match a number if it has a word boundary before the first character and after the last character, so the following styles should not match:

0395551234word
03 9555 1234word
03-9555-1234word
+61395551234word
word0395551234
word03 9555 1234
word03-9555-1234
word+61395551234

This is my regex string:

((\+61[ \-]?[2378]|\b0[2378]|\(0[2378]\))[ \-]?[0-9]{4}[ \-]?[0-9]{4}\b)

but it's not correct because it will still match on:

word+61395551234

I can't use the word boundary \b before the + character because the + character is not a word character. I will also note that I'm using JavaScript so I can't use the positive look behind construct to only match the + character if it's at the start of the line or preceded by a space character. If I could use that, here is a Perl regular expression that would achieve my requirement:

(((?<=^|\s)\+61[ \-]?[2378]|\b0[2378]|\(0[2378]\))[ \-]?[0-9]{4}[ \-]?[0-9]{4}\b)

I also can't do any extra processing on the large string that I'm working with, because I need to retain its structure exactly, so no replace() operations before trying the match.

Upvotes: 2

Views: 72

Answers (1)

anubhava
anubhava

Reputation: 786329

You cannot use \b before + character as that is not considered a word character. Instead use: (?:^|\s) before + to make sure + comes at line start or after a whitespace.

Use this regex:

(((?:^|\s)\+61[ -]?[2378]|\b0[2378]|\(0[2378]\))[ -]?[0-9]{4}[ -]?[0-9]{4}\b)

RegEx Demo

Upvotes: 2

Related Questions