Reputation: 23
Trying to extract valid phone numbers from a string with no visible delimiters between different data types. In effect, the data around the potential phone number is random and irrelevant.
What should be matched. I am trying to match either one of the following:
[random garbage][optional '1'][optional '(']###[optional')'[[random or no white space]###-####[random garbage]
or [random garbage]###[optional '-']###,[optional '-']####[random garbage]
Only the 1st phone number is to be used, so I used Global = False
in the code below. I could make it even more robust, but I've examined the data and this should be enough.
Working pattern.Here's a code snippet from a function (it returns the matching phone number) that contains the pattern that worked.
With regex
.Global = False
.Pattern = "1?(\(?\d{3}\)?\(s+)?|\d{3}-?)\d{3}-?\d{4}"
'this works but does detect an extension as explained above
End With
What should not be matched. I realized that I also need to search for an extension next to the phone number (i.e. [phone number][white space]x#
) and if that exists, to treat the phone number as invalid (.test
should evaluate to false).
Failed attemps. They all failed (even valid phone numbers had .test
evaluate to false):
.Pattern = "1?(\(?\d{3}\)?\(s+)?|\d{3}-?)\d{3}-?\d{4}^(\s?x\d)"
'detect not([optional single white space]x#), added "^(\s?x\d)"
'or
.Pattern = "1?(\(?\d{3}\)?\(s+)?|\d{3}-?)\d{3}-?\d{4}^((\s+?)[x]\d)"
'detect not([optional multiple white space]x#), added "^((\s+?)[x]\d)"
Not sure how to combine positive match tests and negative (not) match tests in the same pattern.
Work-arounds I've tried. When I couldn't get it to work, I tried the following Like
patterns (using VBA 'Like', prior to calling the function that utilized Regexp) and that also failed (all evaluated to false even with test strings that contained examples such as "...1x2"
or "5 x33"
or "...7 x444"
; with patterns like "*#x#*"
, "*#{ x}#*"
, ""*#{ x}#*"
.
Here is the code snippet to test the Like function:
If Not (OrigNum Like "*#x#" Or OrigNum Like "*#[ x}#" Or OrigNum Like "*#[ x]#*") Then
Debug.Print "No #x# in string"
OrigNum = ExtractPhoneNumber(OrigNum)
Else
Debug.Print "#x# in string"
End If
Every string evaluated ended up causing "No x# in string"
to be displayed (evaluated to false), even when the string contained the above examples, which should have evaluated to true and "#x# in string"
being displayed.
Dazed and confused for so long it's...OK, enough of the Led Zepp reference :-)
Upvotes: 2
Views: 1388
Reputation: 6511
Phone number:
[optional '1'][optional '(']###[optional')'[[random or no white space]###-####
###[optional '-']###[optional '-']####
*I removed a comma I assumed as a typo, and also assuming the leading 1
is optional for both cases from what I read from your patterns.
Don't match:
[phone number][white space]x#
What you're looking for is negative lookaheads.
(?! subexpression )
asserts for that subexpression from the current position and, if the subexpression matches, the match attempt fails (i.e. not followed by).
E.g. (?!\s*x\d)
fails when the current position is followed by optional whitespace, an "x" and a digit.
Regex:
1?(?:\(\d{3}\)|\d{3}-?)\s*\d{3}-?\d{4}(?!\s*x\d)
Code:
Public Function getPhoneNumber(strInput As String) As Variant
Dim regex As New RegExp
Dim matches As Object
regex.Pattern = "1?(?:\(\d{3}\)\s*|\d{3}-?)\d{3}-?\d{4}(?!\s*x\d)"
regex.Global = False
Set matches = regex.Execute(strInput)
If matches.Count = 0 Then
getPhoneNumber = CVErr(xlErrNA)
Else
getPhoneNumber = matches(0).Value
End If
End Function
Results (šµAs it was, then again it will be; though the course may change sometimesšµ):
Upvotes: 1