Reputation: 37458
I have a list of strings which I'd like to filter using a regex. I have the beginnings of a solution:
lines = ['Some data', 'Data of interest', 'Some data', 'Data of Interest', 'Some data', 'Data of interest']
r = re.compile(r'.*[iI]nterest.*')
relevant_lines = [r.findall(line) for line in lines]
print(relevant_lines)
...that almost works:
[[], ['Data of interest'], [], ['Data of Interest'], [], ['Data of interest']]
...but is there a way to only populate the resulting list with the lines that match and without the nested lists?
Edit - is there a cleaner way than the following?
[r[0] for r in [r.findall(line) for line in lines] if len(r) > 0]
Upvotes: 1
Views: 2966
Reputation: 6333
relevant_lines = [m.group(0) for m in map(r.match, lines) if m is not None]
here is result in console:
>>> import re
>>> lines = ['Some data', 'Data of interest', 'Some data', 'Data of Interest', 'Some data', 'Data of interest']
>>> r = re.compile(r'.*[iI]nterest.*')
>>> relevant_lines = [m.group(0) for m in map(r.match, lines) if m is not None]
>>> relevant_lines
['Data of interest', 'Data of Interest', 'Data of interest']
things are not complicated. it's very good to combine functional programming with generators.
Upvotes: 2
Reputation: 180401
Just use a normal loop, not everything is suitable for a list comp:
r = re.compile(r'.*[iI]nterest.*')
relevant_lines = []
for line in lines:
mtch = r.match(line)
if mtch:
relevant_lines.append(mtch.group())
If you were using a list comp, a generator expression and filtering the empty lists would be better:
relevant_lines = filter(None,(r.findall(line) for line in lines))
Or indeed filter with match:
[x.group() for x in filter(None,(r.match(line) for line in lines))]
for python2 use itertools.ifilter.
Or for a more functional approach switching map for itertools.imap and filter for ifilter
using python2:
[x.group() for x in filter(None, map(r.match, lines))]
Your own list comp can be rewritten using a generator expression for the inner loop:
[r[0] for r in (r.findall(line) for line in lines) if r]
If you don't need the list use a generator expression and just iterate over it.
Upvotes: 2