yiwei
yiwei

Reputation: 4190

python converting map to list taking a long time

EDIT: I'm using Python 3.5.0, and so map() will return an iterator instead of a list, unlike Python 2.x

I have a list of units and I am calling a REST api on all of them to return more data about them. I'm using map() to do this, but when I try to convert that map to a list, the program hangs there and doesn't proceed (both when I run it and debug it)

data = list(map(lambda product: client.request(units_url + "/" + product), units))

At first I thought maybe it was an issue with calling the api so quickly, but when I iterate through the map (without converting it to a list) manually and print it goes just fine:

data = map(lambda product: client.request(units_url + "/" + product), units)
for item in data:
    print(item)    # <-- this works just fine for the entire map

Anyone know why I'm getting this behavior?

Upvotes: 4

Views: 2173

Answers (2)

ShadowRanger
ShadowRanger

Reputation: 155546

When you list-ify the map, that means every single request is dispatched serially, waits for completion, then stores to the resulting list. If you're dispatching 1000 requests, that means each request must complete in order, one by one, before the list is constructed and you see the first result; it's entirely synchronous.

You get results (almost) immediately in the direct map iteration case because it only makes one request at a time; instead of waiting for 1000 requests, it waits for 1, you process that result, then it waits for another, etc.

If the goal is to minimize latency, take a look at multiprocessing.Pool.imap (or the thread based version of the pool implemented in multiprocessing.dummy; threads can be ideal for parallel network I/O requests and won't require pickling data for IPC). With the Pool's map, imap, or imap_unordered methods (choose one based on your needs), the requests will be dispatched asynchronously, several at a time (depending on the number of workers you select). If you absolutely must have a list, Pool.map will usually construct it faster; if you can iterate directly and don't care about the ordering of results, Pool.imap_unordered will get you results as fast as the workers can get them, in whatever order they are satisfied in. Plain map without a Pool isn't getting you any magical performance benefits (a list comprehension would usually run faster actually), so use a Pool.

Simple example code for fastest results:

import multiprocessing.dummy as multiprocessing  # Import thread based version of library; for network I/O should work fine

with multiprocessing.Pool(8) as pool:  # Pool of eight worker threads
    for item in pool.imap_unordered(lambda product: client.request(units_url + "/" + product), units):
        print(item)

If you really need to, you can use Pool.map and store to a real list, and assuming you have the bandwidth to run eight parallel requests (or however many workers you configure the pool for), that should (roughly) divide the time to complete the map by eight.

Upvotes: 2

RobertB
RobertB

Reputation: 1929

Better answer than I previously had. Check out this link. Near the bottom of the answer it gives a great analysis on why you should really use a list comprehension.

data = [ client.request(units_url + "/" + product) for product in units ]

Upvotes: -1

Related Questions