Jon Cage
Jon Cage

Reputation: 37458

Pythonic way to filter a list of data with a regex?

I have a list of strings which I'd like to filter using a regex. I have the beginnings of a solution:

lines = ['Some data', 'Data of interest', 'Some data', 'Data of Interest', 'Some data', 'Data of interest']
r = re.compile(r'.*[iI]nterest.*')
relevant_lines = [r.findall(line) for line in lines]
print(relevant_lines)

...that almost works:

[[], ['Data of interest'], [], ['Data of Interest'], [], ['Data of interest']]

...but is there a way to only populate the resulting list with the lines that match and without the nested lists?

Edit - is there a cleaner way than the following?

[r[0] for r in [r.findall(line) for line in lines] if len(r) > 0]

Upvotes: 1

Views: 2966

Answers (2)

Jason Hu
Jason Hu

Reputation: 6333

relevant_lines = [m.group(0) for m in map(r.match, lines) if m is not None]

here is result in console:

>>> import re
>>> lines = ['Some data', 'Data of interest', 'Some data', 'Data of Interest', 'Some data', 'Data of interest']
>>> r = re.compile(r'.*[iI]nterest.*')
>>> relevant_lines = [m.group(0) for m in map(r.match, lines) if m is not None]
>>> relevant_lines
['Data of interest', 'Data of Interest', 'Data of interest']

things are not complicated. it's very good to combine functional programming with generators.

Upvotes: 2

Padraic Cunningham
Padraic Cunningham

Reputation: 180401

Just use a normal loop, not everything is suitable for a list comp:

r = re.compile(r'.*[iI]nterest.*')
relevant_lines = []
for line in lines:
    mtch = r.match(line)
    if mtch:
        relevant_lines.append(mtch.group())

If you were using a list comp, a generator expression and filtering the empty lists would be better:

relevant_lines = filter(None,(r.findall(line) for line in lines))

Or indeed filter with match:

[x.group() for x in filter(None,(r.match(line) for line in lines))]

for python2 use itertools.ifilter.

Or for a more functional approach switching map for itertools.imap and filter for ifilter using python2:

[x.group() for x in filter(None, map(r.match, lines))]

Your own list comp can be rewritten using a generator expression for the inner loop:

[r[0] for r in (r.findall(line) for line in lines) if r]

If you don't need the list use a generator expression and just iterate over it.

Upvotes: 2

Related Questions