ice13berg
ice13berg

Reputation: 713

Trying to multiprocess a function requiring a list argument in python

My problem is that I'm trying to pass a list as a variable to a function, and I'd like to mutlti-thread the function processing. I can't seem to use pool.map because it only accepts iterables. I can't seem to use pool.apply because it seems to block the pool while it works, so I don't really understand how it allow mutli-threading at all (admittedly, I don't seem to understand anything about multi-threading). I tried pool.apply_async, but the program finishes in seconds, and only appears to process about 20000 total computations. Here's some psuedo-code for it.

import MySQLdb
from multiprocessing import Pool

def some_math(x, y):
    f(x[1], x[2], y[1], y[2])
    return f

def distance(x):
    x_distances = []
    for y in all_y:
        distance = some_math(x, y)
        if distance > 1000000:
            continue
        else:
            x_distances.append(x[0], y[0],distance)
        mysql.executemany(sql_update, x_distances)
        mydb.commit()

all_x = []
all_y = []
sql_x = 'SELECT id, lat, lng FROM table'
sql_y = 'SELECT id, lat, lng FROM table'
sql_update = 'INSERT INTO distances (id_x, id_y, distance) VALUES (%s, %s, %S)'

cursor.execute(sql_x)
all_x = cursor.fetchall()

cursor.execute(sql_y)
all_y = cursor.fetchall()

p = Pool(4)
for x in all_x:
    p.apply_async(distance, x)

OR, if using map:

p = Pool(4)
for x in all_x:
    p.map(distance, x)

The error returns: Processing A for distances...

Traceback (most recent call last):
  File "./distance-house.py", line 94, in <module>
    p.map(range, row)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
TypeError: 'float' object has no attribute '__getitem__'

I am trying to multi-thread a long computation - calculating a the distance between something like 10,000 points on a many-to-many basis. Currently, the process is taking several days, and I figure that multiprocessing the results could really improve the efficiency. I'm all ears for suggestions.

Upvotes: 0

Views: 1133

Answers (2)

Gabe
Gabe

Reputation: 111

Another way to Approach it is to pack your variables inside a tuble and unpack inside the function. example:

def Add(z):
  x,y = z
  return x + y

a = [ 0 , 1, 2, 3]
b = [ 5, 6, 7, 8]
ab = (a,b)

Add(ab)

Upvotes: 0

Yulia V
Yulia V

Reputation: 3559

You can use pool.map:

p = Pool(4)
p.map(distance, all_x)

as per the first example in the doc. It will do the iteration for you!

Upvotes: 1

Related Questions