Reputation: 209
I am iterating through a for loop looking for keyword matches in a list and then compiling the match indices to a third list. I can compile the indices as a list of lists, but I want to further group sub-lists by the item they matched.
import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']
indices=[]
pats = [re.compile(i) for i in keywords]
for pat in pats:
for i in my_list:
for m in re.finditer(pat, i):
a =list((m.start(),m.end()))
indices.append(a)
print(indices)
This returns:
[[0, 2], [0, 2], [1, 3]]
Trying to get:
[[0, 2], [[0, 2], [1, 3]]]
so that it is clear that:
[[0, 2], [1, 3]]
are indices matches on 'cde' in the example above.
Upvotes: 1
Views: 187
Reputation: 608
Make indices a dict:
import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']
indices = {}
pats = [re.compile(i) for i in keywords]
for pat in pats:
for i in my_list:
indices.setdefault(i, [])
for m in re.finditer(pat, i):
a = list((m.start(),m.end()))
indices[i].append(a)
print(indices)
Giving:
{'cde': [[0, 2], [1, 3]], 'ab': [[0, 2]]}
Is this what you're looking for?
I played with this code for a while and since you import itertools you might as well use it to get rid off those ugly nested fors ;) like that:
import re
from itertools import product
my_list = ['ab', 'cde']
keywords = ['ab', 'cd', 'de']
indices = {}
pats = [re.compile(i) for i in keywords]
for i, pat in product(my_list, pats):
indices.setdefault(i, [])
for m in re.finditer(pat, i):
indices[i].append((m.start(), m.end()))
print(indices)
Unfortunately I can't get Bakuriu's idea to use list comprehension to work properly. So for now this seems like the best solution to me.
Upvotes: 2
Reputation: 101909
Create a list
for each match and accumulate the matches in this list
, finally add it to the result:
import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']
indices=[]
pats = [re.compile(i) for i in keywords]
for pat in pats:
for i in my_list:
sublist = []
for m in re.finditer(pat, i):
a =list((m.start(),m.end()))
sublist.append(a)
indices.append(sublist)
print(indices)
Or you could use a list-comprehension:
import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']
indices=[]
pats = [re.compile(i) for i in keywords]
for pat in pats:
for i in my_list:
sublist = [(m.start(), m.end()) for m in re.finditer(pat, i)]
indices.append(sublist)
print(indices)
Upvotes: 0