Reputation: 89
I have a list of addresses have merged postal codes shown below
65 Windermere Ave., Toronto, ON, M6S 3J4
15 Bruyeres Mews, Toronto, ON M5V 0G8, Canada
M6M1N7, Canada
437 revus Ave, L5G 1S2, Mississauga, ON
, ST. CATHARINES L2M 6Z2 ON
15 Viking Lane, Toronto M9B0A4 ON
I tried this regex to extract postal code but got no output
^.{3}(?:[\s]).{3}^$
Update
postal codes examples: M6S 3J4,L5G 1S2,M9B0A4
Upvotes: 1
Views: 1253
Reputation: 4472
You can try this regex (([A-Z]\d[A-Z])+\s?\d[A-Z]\d)
that will find all the upper case letter follows by digit and other upper case letter and then it checks if there is a space followed by digit upper case letter and another digit.
import re
txt = """
65 Windermere Ave., Toronto, ON, M6S 3J4
15 Bruyeres Mews, Toronto, ON M5V 0G8, Canada
M6M1N7, Canada
437 revus Ave, L5G 1S2, Mississauga, ON
, ST. CATHARINES L2M 6Z2 ON
15 Viking Lane, Toronto M9B0A4 ON
"""
print([i[0] for i in re.findall(r"(([A-Z]{1}\d{1}[A-Z]{1})+\s?(\d{1}[A-Z]{1}\d{1})*)", txt)])
Output
['M6S 3J4', 'M5V 0G8', 'M6M1N7', 'L5G 1S2', 'L2M 6Z2', 'M9B0A4']
Upvotes: 4
Reputation: 5237
Canadian postal codes take the format A1A 1A1
where A
is a letter and 1
is a digit. You're better off matching that explicitly in your regex.
A working regex would look like this:
[A-Z]\d[A-Z]\s?\d[A-Z]\d
A few things to note:
re.search
and not re.match
, since re.match
anchors the match\s
optional using ?
(which means zero or one occurrences)Here's a working example:
lines = '''65 Windermere Ave., Toronto, ON, M6S 3J4
15 Bruyeres Mews, Toronto, ON M5V 0G8, Canada
M6M1N7, Canada
437 revus Ave, L5G 1S2, Mississauga, ON
, ST. CATHARINES L2M 6Z2 ON
15 Viking Lane, Toronto M9B0A4 ON'''
import re
for s in lines.split('\n'):
m = re.search(r'([A-Z]\d[A-Z]\s?\d[A-Z]\d)', s)
if m:
print(m.group(1))
Output:
M6S 3J4
M5V 0G8
M6M1N7
L5G 1S2
L2M 6Z2
M9B0A4
If you want a fully general solution for matching postal codes from around the world, then check out this post.
Upvotes: 3