Reputation: 1043
I have two lists: ignorelist
which is a list of regular expressions, and another list calledurllist
. I am trying to make it so if an index item in urllist
matches a regular expression in ignorelist
, it will not get added to finallist
ignorelist = ['(?:\.)amazon\.com(?:\/(?:.*))',
'(?:\.)google\.com(?:\/(?:.*))']
urllist = ['api.amazon.com/', 'fakedomain.com/']
finallist = []
for r in ignorelist:
r = re.compile(r)
finallist = [x for x in urllist if not r.match(x)]
which outputs
['api.amazon.com/', 'fakedomain.com/']
I'm trying to make the output basically be ['fakedomain.com/']
because it wouldn't match the regular expressions in ignorelist
Upvotes: 1
Views: 2530
Reputation: 11423
You are filtering for each regex of your ignore list, and then reassigning finallist
each time. So only the last regex will be taken in account.
finallist = []
for url in urllist:
if any([re.search(r, url) for r in ignorelist]):
finallist.append(url)
or using a list comprehension:
finallist = [url for url in urllist if not any(re.search(r, url) for r in ignorelist)]
Upvotes: 1
Reputation: 140307
several issues here:
re.match
searches at the start of the line. Your expressions are not built for that. Use re.search
.I would do:
import re
ignorelist = ['(?:\.)amazon\.com(?:\/(?:.*))',
'(?:\.)google\.com(?:\/(?:.*))']
urllist = ['api.amazon.com/', 'fakedomain.com/']
finallist = [x for x in urllist if not any(re.search(r,x) for r in ignorelist)]
so finallist
contains only urls not matching any of the regexes of ignorelist
result:
['fakedomain.com/']
note that I didn't "compile" the regexes, but you may gain some speed by doing so when testing a lot of domains.
Upvotes: 3