Reputation: 23
I have a list of filenames (strings) and and a set ls consisting of floats. Initially I want to filter all files that match each element of ls according to a predetemined expression:
I convert all floats that are actually integers to integers and feed that into .format
to create an appropriate search string (exprs
). This produces the expected sequence of strings. I now want to filter 'files' using re.search, but as I understand it I need a different filter for each output of exprs. So I nested this inside of a map function:
t = 'Matrix'
exprs = map('{}_spike_{}_D1_1'.format , cycle([t]) ,(int(x) if x.is_integer() else x for x in ls))
y = map(lambda f:filter(lambda i : re.search(f,i), files), exprs)
Print(next(exprs))
produces the expected output i.e. 'Matrix_spike_50_D1_1'
. If i 'freeze' the expression in re.search i.e. by doing b = next(exprs)
and re.search(b, [...])
I get the expected output (i.e. the filename, correctly selected).
But when I try to use map
to consume all outputs of exprs
and return resulting filter([...])
's I get
while True
, catching all StopIterations
and resumingHow can I modify this to return files that filter returns for each exprs?
Upvotes: 0
Views: 385
Reputation: 44148
If I understand your problem correctly, you have a list of files such as:
files = ['a', 'b', 'Matrix_spike_2_D1_1', 'c', 'Matrix_spike_4_D1_1']
and a list of floating point numbers, which should be integers (but maybe not all of them are):
ls = [1.1, 2.0, 3.0, 4.0, 5.0]
From the ls
list of numbers that are integers, you construct the names 'Matrix_spike_2_D1_1', 'Matrix_spike_3_D1_1', etc. and then select from the files
list those files that satisfy the rex.search
call. Of course, method search
without using ^
and $
anchors will not be doing a full match, so I wonder whether you really meant to be using the fullmatch
method.
First, you have:
t = 'Matrix'
exprs = map('{}_spike_{}_D1_1'.format , cycle([t]) ,(int(x) if x.is_integer() else x for x in ls))
I believe this is simplified to:
exprs = map('Matrix_spike_{}_D1_1'.format, (int(x) for x in ls if x.is_integer())
And note that I am only selecting values from ls
that are integers, which I believe is your intention. To go with your approach, I believe the simplest remedy is to define a function filter_func
:
import re
ls = [1.1, 2.0, 3.0, 4.0, 5.0]
files = ['a', 'b', 'Matrix_spike_2_D1_1', 'c', 'Matrix_spike_4_D1_1']
exprs = list(map(re.compile, map('Matrix_spike_{}_D1_1'.format, (int(x) for x in ls if x.is_integer()))))
def filter_func(f):
for expr in exprs:
if expr.search(f):
return True
return False
matched_files = list(filter(filter_func, files))
print(matched_files)
Prints:
['Matrix_spike_2_D1_1', 'Matrix_spike_4_D1_1']
Or with a more "functional" but perhaps less efficient approach:
import re
import operator
import functools
ls = [1.1, 2.0, 3.0, 4.0, 5.0]
files = ['a', 'b', 'Matrix_spike_2_D1_1', 'c', 'Matrix_spike_4_D1_1']
exprs = list(map(re.compile, map('Matrix_spike_{}_D1_1'.format, (int(x) for x in ls if x.is_integer()))))
filter_func = lambda f: functools.reduce(operator.or_, map(lambda expr: bool(expr.search(f)), exprs), False)
matched_files = list(filter(filter_func, files))
print(matched_files)
Prints:
['Matrix_spike_2_D1_1', 'Matrix_spike_4_D1_1']
But I believe your approach is not the most efficient. You should be doing instead a single regular expression search against each element of your files
list. In the example above, that regular expression would be:
rex = re.compile('Matrix_spike_(?:2|3|4|5)_D1_1')
In the regular expression above you would be matching each element of the files
list against all 4 possible file names you are looking for. That reduces the code to:
import re
ls = [1.1, 2.0, 3.0, 4.0, 5.0]
files = ['a', 'b', 'Matrix_spike_2_D1_1', 'c', 'Matrix_spike_4_D1_1']
sub_rex = '|'.join(str(int(x)) for x in ls if x.is_integer())
rex = re.compile('Matrix_spike_(?:' + sub_rex + ')_D1_1');
matched_files = list(filter(rex.search, files))
print(matched_files)
Prints:
['Matrix_spike_2_D1_1', 'Matrix_spike_4_D1_1']
If you mean to be doing a full match (equality) against file names, then the following code would be the most efficient because it adds the names you are looking for to a set and each comparison will be a constant-time lookup:
ls = [1.1, 2.0, 3.0, 4.0, 5.0]
files = ['a', 'b', 'Matrix_spike_2_D1_1', 'c', 'Matrix_spike_4_D1_1']
sought_files = {f'Matrix_spike_{int(x)}_D1_1' for x in ls if x.is_integer()}
matched_files = list(filter(lambda f: f in sought_files, files))
print(matched_files)
Upvotes: 3