Reputation: 31
Using the solution posted here, I'm looking to extract postal codes from a list of irregular data in Excel.
Below is a sample of what my data looks like:
Brampton L6P 2G9 ON Canada
M5B2R3 Toronto ON
Toronto M5J 0A6 ON Canada
M1H1T7 Canada
Toronto M4P1T8 ON Canada
MISSISUAGABRAMPTON L5M6S6 ON Canada
333 Sea Ray Inisfil l4e2y6 ON Canada
To call the function, I'm using the following formula
=RegexExtract(A1,"^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$")
However the function is not working for me. I think I need to tweak my regex expression in some way but I don't know what I'm missing.
Upvotes: 3
Views: 1703
Reputation:
Try,
=REGEXEXTRACT(upper(A2), "[A-X]\d[A-Z] ?\d[A-Z]\d")
'alternate
=left(REGEXEXTRACT(upper(A2), "[A-X]\d[A-Z] ?\d[A-Z]\d"), 3)&" "&right(REGEXEXTRACT(upper(A2), "[A-X]\d[A-Z] ?\d[A-Z]\d"), 3)
Upvotes: 4
Reputation: 22195
You have 2 issues.
First, the expression - if you need to extract the postal code, you can't anchor your regex with ^
and $
. The first means "match must occur at the start of the string" and the second means "match must end at the end of the string". This is only useful if you are validating a postal code, but it obviously can't be used to extract one from your examples because they all contain things other than the postal code.
The other problem with the regex is the negative look-ahead assertion (?!.*[DFIOQU])
, which means "no match can contain the letters D, F, I, O, Q, or U". To the best of my recollection, this isn't supported in VBScript regex. If I'm mistaken, please correct me in the comments.
That gives you the slightly more pedantic expression:
[ABCEGHJKLMNPRSTVX]\d[ABCEGHJKLMNPRSTVWXYZ][ -]?\d[ABCEGHJKLMNPRSTVWXYZ]\d
I took the liberty of optionally allowing a -
between the FSA and LDU because I see that a lot, particularly from non-Canadians.
Second, the function that you're calling (copied below from the linked answer):
Function RegexExtract(ByVal text As String, _ ByVal extract_what As String, _ Optional separator As String = ", ") As String Dim allMatches As Object Dim RE As Object Set RE = CreateObject("vbscript.regexp") Dim i As Long, j As Long Dim result As String RE.pattern = extract_what RE.Global = True Set allMatches = RE.Execute(text) For i = 0 To allMatches.count - 1 For j = 0 To allMatches.Item(i).submatches.count - 1 result = result & (separator & allMatches.Item(i).submatches.Item(j)) Next Next If Len(result) <> 0 Then result = Right$(result, Len(result) - Len(separator)) End If RegexExtract = result End Function
The first problem is that it is case sensitive. It is also tailored to extracting submatches, which you don't care about - your examples are looking for a single match.
I'd go with this much simpler option that also correctly formats the output:
Public Function ExtractCanadianPostalCode(inputText As String) As String
With CreateObject("vbscript.regexp")
.Pattern = "[ABCEGHJKLMNPRSTVX]\d[ABCEGHJKLMNPRSTVWXYZ][ -]?\d[ABCEGHJKLMNPRSTVWXYZ]\d"
.IgnoreCase = True
If .Test(inputText) Then
Dim matches As Object
Set matches = .Execute(inputText)
ExtractCanadianPostalCode = UCase$(Left$(matches(0), 3) & " " & Right$(matches(0), 3))
End If
End With
End Function
Upvotes: 3