Reputation: 1677
I want to map an array of lists like the one below using the function process_slide_index(x)
tiles_index:
[(1, 1024, 0, 16, 0, 0), (1, 1024, 0, 16, 0, 1), (1, 1024, 0, 16, 0, 2), (1, 1024, 0, 16, 0, 3), (1, 1024, 0, 16, 0, 4), (1, 1024, 0, 16, 0, 5), (1, 1024, 0, 16, 0, 6),...]
tiles:
tiles = map(lambda x: process_slide_index(x), tiles_index)
the map function:
def process_slide_index(tile_index):
print("PROCESS SLIDE INDEX")
slide_num, tile_size, overlap, zoom_level, col, row = tile_index
slide = open_slide(slide_num)
generator = create_tile_generator(slide, tile_size, overlap)
tile = np.asarray(generator.get_tile(zoom_level, (col, row)))
return (slide_num, tile)
I'm applying the map function but I don't seem to get inside my process_slide_index(tile_index)
function.
I also want to filter some results given a function that returns True
of False
. But once again my function does not reach the filter function.
filtered_tiles = filter(lambda x: keep_tile(x, tile_size, tissue_threshold), tiles)
What am I doing wrong?
Regards
EDIT The only way I got to reach that checkpoint message PROCESS SLIDE INDEX
was adding list(map(print, tiles))
after the tiles line. I was using this to try to debug and my prints started showing up. I'm pretty confused right now.
Upvotes: 1
Views: 836
Reputation: 489
TL;DR -
List comprehension can do a lot of things you might want here. [x for x in mylist if x > y]
is a powerful expression that more than replaces filter()
. It's also a nice alternative to map()
, and is much more efficient than using a lambda expression. It also spits out a list instead of a generator, which is probably preferable in your case. (If you're dealing with huge streams of data, you might want to stick with map
and filter
, because with generators you don't have to keep the whole thing in RAM, you can work out one value at a time.) If you like this suggestion and want to skip the talk, I give you the code in 2b.
Don't write a lambda expression for a function that already exists! Lambda expressions are stand-in functions where you haven't defined one. They're much slower and have some weird behaviors. Avoid them where possible. You could replace the lambda in your map()
call with the function itself: tiles = map(process_slide_index, tiles_index)
The long version:
There are two problems, both are pretty easy to fix. First one is more of a style/efficiency thing, but it'll save you some obscure headaches, too:
1. Instead of creating a lambda expression, it's best to use the function you already went to the work of defining!
tiles = map(process_slide_index, tiles_index)
does the job just fine, and behaves better.
2. You should probably switch to list comprehensions. Why? Because map()
and filter()
are uglier and they're slower if you have to use a lambda or want to convert the output to a list afterwards. Still, if you insist on using map()
and filter()
...
2a. When you need to pass multiple arguments into a function for map, try functools.partial
if you know many of the values ahead of time. I think it's an error in your logic when you're trying
filtered_tiles = filter(lambda x: keep_tile(x, tile_size, tissue_threshold), tiles)
What you're telling it to do is to call keep_tile()
on a vector of [x for x in tiles]
while holding tile_size
and tissue_threshold
constant.
If this is the intended behavior, try import functools
and use functools.partial(keep_tile, tile_size, tissue_threshold)
.
Note: Using functools.partial
requires that any variables you pass to the partial function are the rightmost arguments, so you'd have to rewrite the function header as def keep_tile(tile_size, tissue_threshold, tiles):
instead of def keep_tile(tiles, tile_size, tissue_threshold):
. (See that we again manage to avoid a lambda expression!)
If that isn't the intended behavior, and you wanted each of those values to change with every call, just pass a tuple in! filter(keep_tile, (tile, tile_size, tissue_threshold)))
. If you just want the tile
variable from this, you can use a list comprehension:
[x[0] for x in filter(keep_tile, (tile, tile_size, tissue_threshold)))]
(Again, with no lambdas.) However, since we're already doing a list comprehension here, you might want to try the solution in 2b.
2b. It's generally faster and cleaner on later Python releases just to use a list comprehension such as [x[0] for x in tiles if keep_tile(*x)]
. (Or, if you meant to hold the other two values constant, you could use [x for x in tiles if keep_tile(x, tile_size, tissue_threshold)]
.) Any time you're just going to read that map()
or filter()
's output into a list afterwards, you should probably have used a list comprehension instead. At this point map()
and filter()
are really only useful for streaming results through a pipeline, or for async routines.
Upvotes: 0
Reputation: 42716
You are using python3
, in python2 map
and filter
return a list, but in python3 they return an object that you have to consume to get the values:
>>> l = list(range(10))
>>> def foo(x):
... print(x)
... return x+1
...
>>> map(foo, l)
<map object at 0x7f69728da828>
For consuming this object, you can use list
for example. Notice how the print
is called this time:
>>> list(map(foo, l))
0
1
2
3
4
5
6
7
8
9
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
This objects are lazy, that means that they yield
the values one by one. Check the differences when using them as iterators in a for
loop:
>>> for e in map(foo, l):
... print(e)
...
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
Using list
does the same, but stores each taken value in that list.
Upvotes: 3
Reputation: 640
You should remove the lambda from your map
call. map
will call the function provided in the first argument and in your case you have provided a wrapper function for the function you actually want to call.
tiles = map(process_slide_index, tiles_index)
Upvotes: 1