python re.match list of regular expressions

Question

I have two lists: ignorelist which is a list of regular expressions, and another list calledurllist. I am trying to make it so if an index item in urllist matches a regular expression in ignorelist, it will not get added to finallist

ignorelist = ['(?:\.)amazon\.com(?:\/(?:.*))',
            '(?:\.)google\.com(?:\/(?:.*))']

urllist = ['api.amazon.com/', 'fakedomain.com/']
finallist = []

for r in ignorelist:
    r = re.compile(r)
    finallist = [x for x in urllist if not r.match(x)]

which outputs

['api.amazon.com/', 'fakedomain.com/']

I'm trying to make the output basically be ['fakedomain.com/'] because it wouldn't match the regular expressions in ignorelist

Jean-Fran&#231;ois Fabre · Accepted Answer

several issues here:

re.match searches at the start of the line. Your expressions are not built for that. Use re.search.
your assigning the result in a loop: wrong logic.

I would do:

import re

ignorelist = ['(?:\.)amazon\.com(?:\/(?:.*))',
            '(?:\.)google\.com(?:\/(?:.*))']

urllist = ['api.amazon.com/', 'fakedomain.com/']


finallist = [x for x in urllist if not any(re.search(r,x) for r in ignorelist)]

so finallist contains only urls not matching any of the regexes of ignorelist

result:

['fakedomain.com/']

note that I didn't "compile" the regexes, but you may gain some speed by doing so when testing a lot of domains.

python re.match list of regular expressions

Answers (2)

Related Questions