Alex
Alex

Reputation: 23

Python - Filter nested inside of map produces unexpected output

I have a list of filenames (strings) and and a set ls consisting of floats. Initially I want to filter all files that match each element of ls according to a predetemined expression: I convert all floats that are actually integers to integers and feed that into .format to create an appropriate search string (exprs). This produces the expected sequence of strings. I now want to filter 'files' using re.search, but as I understand it I need a different filter for each output of exprs. So I nested this inside of a map function:

t = 'Matrix'
exprs = map('{}_spike_{}_D1_1'.format , cycle([t]) ,(int(x) if x.is_integer() else x for x in ls))
y = map(lambda f:filter(lambda i : re.search(f,i), files), exprs)

Print(next(exprs)) produces the expected output i.e. 'Matrix_spike_50_D1_1'. If i 'freeze' the expression in re.search i.e. by doing b = next(exprs) and re.search(b, [...]) I get the expected output (i.e. the filename, correctly selected). But when I try to use map to consume all outputs of exprs and return resulting filter([...])'s I get

  1. a filter object instead of a map object
  2. two identical filter objects ,when running it exchaustively by means of a while True, catching all StopIterations and resuming

How can I modify this to return files that filter returns for each exprs?

Upvotes: 0

Views: 385

Answers (1)

Booboo
Booboo

Reputation: 44148

If I understand your problem correctly, you have a list of files such as:

files = ['a', 'b', 'Matrix_spike_2_D1_1', 'c', 'Matrix_spike_4_D1_1']

and a list of floating point numbers, which should be integers (but maybe not all of them are):

ls = [1.1, 2.0, 3.0, 4.0, 5.0]

From the ls list of numbers that are integers, you construct the names 'Matrix_spike_2_D1_1', 'Matrix_spike_3_D1_1', etc. and then select from the files list those files that satisfy the rex.search call. Of course, method search without using ^ and $ anchors will not be doing a full match, so I wonder whether you really meant to be using the fullmatch method.

First, you have:

t = 'Matrix'
exprs = map('{}_spike_{}_D1_1'.format , cycle([t]) ,(int(x) if x.is_integer() else x for x in ls))

I believe this is simplified to:

exprs = map('Matrix_spike_{}_D1_1'.format, (int(x) for x in ls if x.is_integer())

And note that I am only selecting values from ls that are integers, which I believe is your intention. To go with your approach, I believe the simplest remedy is to define a function filter_func:

import re


ls = [1.1, 2.0, 3.0, 4.0, 5.0]
files = ['a', 'b', 'Matrix_spike_2_D1_1', 'c', 'Matrix_spike_4_D1_1']
exprs = list(map(re.compile, map('Matrix_spike_{}_D1_1'.format, (int(x) for x in ls if x.is_integer()))))

def filter_func(f):
    for expr in exprs:
        if expr.search(f):
            return True
    return False

matched_files = list(filter(filter_func, files))
print(matched_files)

Prints:

['Matrix_spike_2_D1_1', 'Matrix_spike_4_D1_1']

Or with a more "functional" but perhaps less efficient approach:

import re
import operator
import functools


ls = [1.1, 2.0, 3.0, 4.0, 5.0]
files = ['a', 'b', 'Matrix_spike_2_D1_1', 'c', 'Matrix_spike_4_D1_1']
exprs = list(map(re.compile, map('Matrix_spike_{}_D1_1'.format, (int(x) for x in ls if x.is_integer()))))
filter_func = lambda f: functools.reduce(operator.or_, map(lambda expr: bool(expr.search(f)), exprs), False)
matched_files = list(filter(filter_func, files))
print(matched_files)

Prints:

['Matrix_spike_2_D1_1', 'Matrix_spike_4_D1_1']

But I believe your approach is not the most efficient. You should be doing instead a single regular expression search against each element of your files list. In the example above, that regular expression would be:

rex = re.compile('Matrix_spike_(?:2|3|4|5)_D1_1')

In the regular expression above you would be matching each element of the files list against all 4 possible file names you are looking for. That reduces the code to:

import re


ls = [1.1, 2.0, 3.0, 4.0, 5.0]
files = ['a', 'b', 'Matrix_spike_2_D1_1', 'c', 'Matrix_spike_4_D1_1']
sub_rex = '|'.join(str(int(x)) for x in ls if x.is_integer())
rex = re.compile('Matrix_spike_(?:' +  sub_rex + ')_D1_1');
matched_files = list(filter(rex.search, files))
print(matched_files)

Prints:

['Matrix_spike_2_D1_1', 'Matrix_spike_4_D1_1']

If you mean to be doing a full match (equality) against file names, then the following code would be the most efficient because it adds the names you are looking for to a set and each comparison will be a constant-time lookup:

ls = [1.1, 2.0, 3.0, 4.0, 5.0]
files = ['a', 'b', 'Matrix_spike_2_D1_1', 'c', 'Matrix_spike_4_D1_1']
sought_files = {f'Matrix_spike_{int(x)}_D1_1' for x in ls if x.is_integer()}
matched_files = list(filter(lambda f: f in sought_files, files))
print(matched_files)

Upvotes: 3

Related Questions