Reputation: 703
I have the following two lists:
input = ['MAPLEWOOD AVE', 'LYNNDALE ', 'SUGAR DR']
ref = ['LYNNDALE (?:RD)?', 'HOMAN (?:AVE)?', 'MAPLEWOOD (?:AVE)?', 'LYNNDALE (?:LN)?']
I would like to look for all matches for each element within input
with ref
. The output would be a dictionary with each key being an input
element, and each value being a ref
element matched to the corresponding input
element, like the following:
{'MAPLEWOOD AVE' : 'MAPLEWOOD AVE', 'LYNNDALE ' : 'LYNNDALE RD', 'LYNNDALE LN', 'SUGAR DR':}
The following allows me to iterate over input
in search of a findall
match within ref
(which contains embedded regex groupings). However, I cannot retrieve the corresponding match element(s) from ref
as values alongside each input element:
combined = "(" + ")|(".join(ref) + ")"
l = []
for i in input:
if re.findall(combined,i):
l.append(i)
...
MAPLEWOOD AVE
LYNNDALE
Upvotes: 1
Views: 2927
Reputation: 566
Try:
import re
input = ['MAPLEWOOD AVE', 'LYNNDALE ', 'SUGAR DR']
ref = ['LYNNDALE (?:RD)?', 'HOMAN (?:AVE)?', 'MAPLEWOOD (?:AVE)?', 'LYNNDALE (?:LN)?']
output = dict([ (i, [ r for r in ref if re.match(r, i) ]) for i in input ])
Or if you're using Python 3:
output = { i : [ r for r in ref if re.match(r, i) ] for i in input }
Also you could compile your regexs to speed them up a little:
ref_re = [ re.compile(r) for r in ref ]
output = { i : [ r.pattern for r in ref_re if r.match(i) ] for i in input }
UPD: Maybe you want to use matched part as values, not patterns:
output = { i : [ r.match(i).group(0) for r in ref_re if r.match(i) ] for i in input }
Upvotes: 5
Reputation: 5668
I think you missed the blank spaces into the regexp. Try this way:
ref = ['LYNNDALE\s*(?:RD)?', 'HOMAN\s*(?:AVE)?', 'MAPLEWOOD\s*(?:AVE)?', 'LYNNDALE\s*(?:LN)?']
Upvotes: 0