Reputation: 66053
EDIT: This is a sympy bug. I have moved the discussion to https://github.com/sympy/sympy/issues/7457
I have a Python program that uses sympy
to perform some core functionality that involves taking the intersection of a line and a shape. This operation needs to be performed several thousand times, and is quite slow when using the default sympy
pure Python modules.
I attempted to speed this up by installing gmpy 2.0.3
(I have also tried with gmpy 1.5
). This does lead to the code speeding up somewhat, but when using multiprocessing
to gain a further speed-up, the program crashes with a TypeError
.
Exception in thread Thread-3:
Traceback (most recent call last):
File "C:\python27\lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\python27\lib\threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "C:\python27\lib\multiprocessing\pool.py", line 376, in _handle_results
task = get()
File "C:\python27\lib\site-packages\sympy\geometry\point.py", line 91, in __new__
for f in coords.atoms(Float)]))
File "C:\python27\lib\site-packages\sympy\simplify\simplify.py", line 3839, in nsimplify
return _real_to_rational(expr, tolerance)
File "C:\python27\lib\site-packages\sympy\simplify\simplify.py", line 3781, in _real_to_rational
r = nsimplify(float, rational=False)
File "C:\python27\lib\site-packages\sympy\simplify\simplify.py", line 3861, in nsimplify
exprval = expr.evalf(prec, chop=True)
File "C:\python27\lib\site-packages\sympy\core\evalf.py", line 1300, in evalf
re = C.Float._new(re, p)
File "C:\python27\lib\site-packages\sympy\core\numbers.py", line 673, in _new
obj._mpf_ = mpf_norm(_mpf_, _prec)
File "C:\python27\lib\site-packages\sympy\core\numbers.py", line 56, in mpf_norm
rv = mpf_normalize(sign, man, expt, bc, prec, rnd)
TypeError: ('argument is not an mpz', <class 'sympy.geometry.point.Point'>, (-7.07106781186548, -7.07106781186548))
The program works fine when run in a single process using gmpy
and when run without gmpy
using multiprocessing.Pool
.
Has anyone run into this sort of problem before? The program below reproduces this problem:
import sympy
import multiprocessing
import numpy
def thread_function(func, data, output_progress=True, extra_kwargs=None, num_procs=None):
if extra_kwargs:
func = functools.partial(func, **extra_kwargs)
if not num_procs:
num_procs = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=num_procs)
results = pool.map_async(func, data.T)
pool.close()
pool.join()
return results.get()
def test_fn(data):
x = data[0]
y = data[1]
circle = sympy.Circle((0,0), 10)
line = sympy.Line(sympy.Point(0,0), sympy.Point(x,y))
return line.intersection(circle)[0].evalf()
if __name__ == '__main__':
data = numpy.vstack((numpy.arange(1, 100), numpy.arange(1, 100)))
print thread_function(test_fn, data) #<--- this line causes the problem
# print [test_fn(data[:,i]) for i in xrange(data.shape[1])] #<--- this one runs without errors
Upvotes: 0
Views: 806
Reputation: 11424
I've verified that gmpy
objects are picklable and that mpmath.mpf
objects that use gmpy
are also picklable.
The error occurs when the man
argument to mpf_normalize()
is not a gmpy
object. If I force man
to be an mpz
, then I no longer get an error. But the answer is different from the single process version.
Single process version:
Point(-223606797749979/50000000000000, -223606797749979/25000000000000)
Multiple process version:
Point(-7.07106781186548, -7.07106781186548)
Both the types used in Point() are different (rational vs. float) and the values are different (-223606797749979/50000000000000 is -4.47213595499958).
I'm still researching and will update this answer if I discover the root cause.
Update #1: The differing values were caused by an error in the example code. The threaded function was passed different values than the non-threaded version.
I'm still tracking down why multiprocessing triggers the exception. I've reduced the problem to the following example:
import sympy
import multiprocessing
import numpy
def thread_function(func, data, output_progress=True, extra_kwargs=None, num_procs=None):
if extra_kwargs:
func = functools.partial(func, **extra_kwargs)
if not num_procs:
num_procs = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=num_procs)
results = pool.map_async(func, data)
pool.close()
pool.join()
return results.get()
def test_fn(data):
return sympy.Point(0,1).evalf()
if __name__ == '__main__':
test_size = 10
print [test_fn(None) for i in xrange(1, test_size)] #<--- this one runs without errors
print thread_function(test_fn, [None] * (test_size - 1)) #<--- this line causes the problem
Upvotes: 1