baobobs
baobobs

Reputation: 703

Match two Python lists with regular expressions, and create dictionary output

I have the following two lists:

input = ['MAPLEWOOD AVE', 'LYNNDALE ', 'SUGAR DR']

ref = ['LYNNDALE (?:RD)?', 'HOMAN (?:AVE)?', 'MAPLEWOOD (?:AVE)?', 'LYNNDALE (?:LN)?']

I would like to look for all matches for each element within input with ref. The output would be a dictionary with each key being an input element, and each value being a ref element matched to the corresponding input element, like the following:

{'MAPLEWOOD AVE' : 'MAPLEWOOD AVE', 'LYNNDALE ' : 'LYNNDALE RD', 'LYNNDALE LN', 'SUGAR DR':}

The following allows me to iterate over input in search of a findall match within ref (which contains embedded regex groupings). However, I cannot retrieve the corresponding match element(s) from ref as values alongside each input element:

combined = "(" + ")|(".join(ref) + ")"

l = []

for i in input:
    if re.findall(combined,i):
         l.append(i)
...
MAPLEWOOD AVE
LYNNDALE

Upvotes: 1

Views: 2927

Answers (2)

Ivan Yurchenko
Ivan Yurchenko

Reputation: 566

Try:

import re

input = ['MAPLEWOOD AVE', 'LYNNDALE ', 'SUGAR DR']
ref = ['LYNNDALE (?:RD)?', 'HOMAN (?:AVE)?', 'MAPLEWOOD (?:AVE)?', 'LYNNDALE (?:LN)?']
output = dict([ (i, [ r for r in ref if re.match(r, i) ]) for i in input ])

Or if you're using Python 3:

output = { i : [ r for r in ref if re.match(r, i) ] for i in input }

Also you could compile your regexs to speed them up a little:

ref_re = [ re.compile(r) for r in ref ]
output = { i : [ r.pattern for r in ref_re if r.match(i) ] for i in input }

UPD: Maybe you want to use matched part as values, not patterns:

output = { i : [ r.match(i).group(0) for r in ref_re if r.match(i) ] for i in input }

Upvotes: 5

Ander2
Ander2

Reputation: 5668

I think you missed the blank spaces into the regexp. Try this way:

ref = ['LYNNDALE\s*(?:RD)?', 'HOMAN\s*(?:AVE)?', 'MAPLEWOOD\s*(?:AVE)?', 'LYNNDALE\s*(?:LN)?']

Upvotes: 0

Related Questions