Bobbh
Bobbh

Reputation: 23

VBA regexp pattern combining match and not matching a pattern

Trying to extract valid phone numbers from a string with no visible delimiters between different data types. In effect, the data around the potential phone number is random and irrelevant.


What should be matched. I am trying to match either one of the following:

Only the 1st phone number is to be used, so I used Global = False in the code below. I could make it even more robust, but I've examined the data and this should be enough.


Working pattern.Here's a code snippet from a function (it returns the matching phone number) that contains the pattern that worked.

With regex
    .Global = False
    .Pattern = "1?(\(?\d{3}\)?\(s+)?|\d{3}-?)\d{3}-?\d{4}"
'this works but does detect an extension as explained above
End With


What should not be matched. I realized that I also need to search for an extension next to the phone number (i.e. [phone number][white space]x#) and if that exists, to treat the phone number as invalid (.test should evaluate to false).


Failed attemps. They all failed (even valid phone numbers had .test evaluate to false):

.Pattern = "1?(\(?\d{3}\)?\(s+)?|\d{3}-?)\d{3}-?\d{4}^(\s?x\d)"
'detect not([optional single white space]x#), added "^(\s?x\d)"
'or
.Pattern = "1?(\(?\d{3}\)?\(s+)?|\d{3}-?)\d{3}-?\d{4}^((\s+?)[x]\d)"
'detect not([optional multiple white space]x#), added "^((\s+?)[x]\d)"

Not sure how to combine positive match tests and negative (not) match tests in the same pattern.


Work-arounds I've tried. When I couldn't get it to work, I tried the following Like patterns (using VBA 'Like', prior to calling the function that utilized Regexp) and that also failed (all evaluated to false even with test strings that contained examples such as "...1x2" or "5 x33" or "...7 x444"; with patterns like "*#x#*", "*#{ x}#*", ""*#{ x}#*".

Here is the code snippet to test the Like function:

If Not (OrigNum Like "*#x#" Or OrigNum Like "*#[ x}#" Or OrigNum Like "*#[  x]#*") Then
    Debug.Print "No #x# in string"
    OrigNum = ExtractPhoneNumber(OrigNum)
Else
    Debug.Print "#x# in string"
End If

Every string evaluated ended up causing "No x# in string" to be displayed (evaluated to false), even when the string contained the above examples, which should have evaluated to true and "#x# in string" being displayed.

Dazed and confused for so long it's...OK, enough of the Led Zepp reference :-)

Upvotes: 2

Views: 1388

Answers (1)

Mariano
Mariano

Reputation: 6511

Phone number:

  • [optional '1'][optional '(']###[optional')'[[random or no white space]###-####
  • ###[optional '-']###[optional '-']####

*I removed a comma I assumed as a typo, and also assuming the leading 1 is optional for both cases from what I read from your patterns.

Don't match:

  • [phone number][white space]x#


What you're looking for is negative lookaheads.

(?! subexpression ) asserts for that subexpression from the current position and, if the subexpression matches, the match attempt fails (i.e. not followed by).
E.g. (?!\s*x\d) fails when the current position is followed by optional whitespace, an "x" and a digit.


Regex:

1?(?:\(\d{3}\)|\d{3}-?)\s*\d{3}-?\d{4}(?!\s*x\d)

Code:

Public Function getPhoneNumber(strInput As String) As Variant
    Dim regex As New RegExp
    Dim matches As Object

    regex.Pattern = "1?(?:\(\d{3}\)\s*|\d{3}-?)\d{3}-?\d{4}(?!\s*x\d)"
    regex.Global = False

    Set matches = regex.Execute(strInput)
    If matches.Count = 0 Then
        getPhoneNumber = CVErr(xlErrNA)
    Else
        getPhoneNumber = matches(0).Value
    End If
End Function

Results (šŸŽµAs it was, then again it will be; though the course may change sometimesšŸŽµ):

Screenshot from Excel showing the input un column A and the result of the function in column B

Upvotes: 1

Related Questions