BitOfABeginner
BitOfABeginner

Reputation: 122

multiprocessing.Pool.map() not working as expected

I understand from simple examples that Pool.map is supposed to behave identically to the 'normal' python code below except in parallel:

def f(x):
# complicated processing
return x+1

y_serial = []
x = range(100)
for i in x: y_serial += [f(x)]
y_parallel = pool.map(f, x) 
# y_serial == y_parallel!

However I have two bits of code that I believe should follow this example:

#Linear version
price_datas = []

for csv_file in loop_through_zips(data_directory):
    price_datas += [process_bf_data_csv(csv_file)]

#Parallel version
p = Pool()
price_data_parallel = p.map(process_bf_data_csv, loop_through_zips(data_directory))

However the Parallel code doesn't work whereas the Linear code does. From what I can observe, the parallel version appears to be looping through the generator (it's printing out log lines from the generator function) but then not actually performing the "process_bf_data_csv" function. What am I doing wrong here?

Upvotes: 0

Views: 1098

Answers (1)

Ayy
Ayy

Reputation: 488

.map tries to pull all values from your generator to form it into an iterable before actually starting the work. Try waiting longer (till the generator runs out) or use multi threading and a queue instead.

Upvotes: 2

Related Questions