Nick DeRobertis
Nick DeRobertis

Reputation: 31

Python: Numpy.min replaces builtin: causes Pyro4 Error: reply sequence out of sync

EDIT: It turned out to be a dill bug, see answer.

EDIT: Look to the bottom for my progress troubleshooting the error, which turns out to be caused by numpy.min replacing the built in min function

I am working with Pyro (version 4.43, Python 3.5.1, Windows 10) and attempting to set up a simple cluster where a server process waits for workers and worker processes request work and send back results. Once the server receives the result, it does some further processing on it.

Currently I'm just trying to get it working on a single computer (using localhost and spawning worker processes from the same computer).

So far I am able to get the server process running, and the worker process is able to connect to the server to request data, process that data, but then the worker process errors out when it tries to send the result back to the server.

I'm running into a strange error message:

File "worker.py", line 90, in <module>
    main()
  File "worker.py", line 87, in main
    worker.send_result()
  File "worker.py", line 49, in send_result
    self.server.recieve(result)
  File "C:\Anaconda3\lib\site-packages\Pyro4\core.py", line 171, in __call__
    return self.__send(self.__name, args, kwargs)
  File "C:\Anaconda3\lib\site-packages\Pyro4\core.py", line 418, in _pyroInvoke
    self.__pyroCheckSequence(msg.seq)
  File "C:\Anaconda3\lib\site-packages\Pyro4\core.py", line 448, in __pyroCheckSequence
    raise errors.ProtocolError(err)
Pyro4.errors.ProtocolError: invoke: reply sequence out of sync, got 0 expected 2

After thorough searching, I can only find one other person who has had this error, but the response was that this was a pure Pyro error, and he needed to update Pyro, but my version is far beyond the current when that was written.

Even further, I'm having trouble reproducing this error outside of my production code. I tried to create a simple version to narrow down where the error is coming from and could not get this error. I even sent a result from a worker of the exact form of the result being sent in the production code with no error.

Here is the simplified code, just to give an idea of the structure of my setup. This code below does not reproduce the error. I'm not sure what the next step is to get it closer to the production code without overcomplicating.

Server code:

#simple_server.py

import Pyro4
import sys, dill

class SimpleServer:

    def serve(self):
        with open('served data.pkl', 'rb') as f:
            data = dill.load(f) #actual data coming from production code
        return data

    def recieve(self, result):
        print(result)

def main():
    Pyro4.config.SERIALIZER = 'dill' #default serpent serializer doesn't work
    dill.settings['recurse'] = True #dill won't work without this option

    server = SimpleServer()
    daemon = Pyro4.Daemon()
    server_uri = daemon.register(server)
    ns = Pyro4.locateNS()
    ns.register("test", server_uri)
    print('Server running.')
    daemon.requestLoop()

if __name__ == '__main__':
    main()    

Worker code:

#simple_worker.py

import Pyro4
import sys, dill
import numpy as np
import scipy.optimize as opt

class SimpleWorker:

    def __init__(self, server):
        self.server = server

    def recieve_data(self):
        self.data = self.server.serve()

    def send_result(self):
        res = opt.basinhopping(lambda x: sum(x), np.arange(11), niter=2, minimizer_kwargs={'options':{'maxiter':2}})
        #This below data structure is the same that I send in production
        result = ('ABCD', 'filename.csv', res, 6)
        self.server.recieve(result) #creates error in production code but not here

def main():
    sys.excepthook = Pyro4.util.excepthook #gives a more meaningful stack trace
    Pyro4.config.SERIALIZER = 'dill' #default serpent serializer doesn't work
    dill.settings['recurse'] = True #dill won't work without this option

    server = Pyro4.Proxy('PYRONAME:test') #connects to pinest server
    worker = SimpleWorker(server)
    worker.recieve_data() 
    worker.send_result()

if __name__ == '__main__':
    main()

Windows CMD code:

#run_simple_server.bat
set PYRO_SERIALIZERS_ACCEPTED=serpent,json,marshal,pickle,dill
start cmd /C python -m Pyro4.naming
python simple_server.py
pause

#run_simple_worker.bat
python simple_worker.py
pause

Note: I need to use Dill with the recursive option to send these types of data

If I print Pyro4.current_context.seq within the worker main it returns 0. If I try Pyro4.current_context.seq = 2 it does not affect the error.

Does anyone know how to deal with this error or what I should do next in attempting to troubleshoot?

EDIT: After a review of the Pyro4 source, it seems that this error is raised due to a coding error in Pyro4. In core.Daemon.handleRequest, if it has an error receiving the message it sets its own message sequence to zero and tries to transmit the error as a message. But when core.Proxy._pyroInvoke receives the message, it has no facility to treat it as an error if the sequence is zero. Thus, the reply sequence out of sync error is raised.

I have figured out the underlying issue which causes the error receiving the message. socketutil.receiveData has a receive loop with a line which picks the minimum of 60000 and the remaining size of the message min(60000, size - msglen). Somehow when this executes, it is using numpy.min rather than the builtin min, and errors out because the second argument to numpy.min is supposed to be the axis number. This is surprising as I only ever import numpy as np in my code, and never from numpy import * or directly import the min function.

What's even more surprising is that I can't fix it by replacing it with the built in function. I try import builtins then min = builtins.min, and the error persists. If I run inspect.getfile(builtins.min) it points to the Numpy file.

I tried to avoid the issue entirely by switching the line for min([60000, size - msglen]), which works for both numpy and built in min, but the min assignment persists back into my server code and messes up functions there too.

As a rather hackish fix, I kept the above change of the min function, but also at the initialization of my server class, I store the builtin functions:

#Store builtin functions as they later get replaced for some unknown reason
b = [t for t in ().__class__.__base__.__subclasses__() if t.__name__ == 'Sized'][0].__len__.__globals__['__builtins__']
self.real_builtins = copy.copy(b) #copy so that dict doesn't get updated

Then every time the server receives or sends data, I run this function first:

def fix_builtins(self):
    global builtins
    import builtins
    __builtins__ = self.real_builtins
    #These are all of [i for i in dir(builtins) if i in dir(numpy)]
    builtins.abs = __builtins__['abs']
    builtins.all = __builtins__['all']
    builtins.any = __builtins__['any']
    builtins.bool = __builtins__['bool']
    builtins.complex = __builtins__['complex']
    builtins.float = __builtins__['float']
    builtins.int = __builtins__['int']
    builtins.max = __builtins__['max']
    builtins.min = __builtins__['min']
    builtins.object = __builtins__['object']
    builtins.round = __builtins__['round']
    builtins.str = __builtins__['str']
    builtins.sum = __builtins__['sum']

This seems to be working now. But this is obviously not a great way to fix the problem, I would rather stop it from replacing the builtin functions in the first place... Is this some Pyro-specific issue?

Upvotes: 1

Views: 159

Answers (1)

Nick DeRobertis
Nick DeRobertis

Reputation: 31

This is a dill bug which is caused by pickling a lambdified Sympy expression. The following code reproduces the error:

from sympy import symbols, lambdify
import dill, inspect

def check_if_builtin(func):
    try:
        file = inspect.getsourcefile(func) #will throw TypeError for builtin
        return file
    except TypeError:
        return True



dill.settings['recurse'] = True #without this option, throws PicklingError

a, b, c = symbols("a b c")
expr = a + b + c
lambda_expr = lambdify([a, b, c], expr)

print(check_if_builtin(min))

dill.dump(lambda_expr, open('test.p', 'wb'))

print(check_if_builtin(min))

returns:

True
C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py

I have submitted this as dill issue #167.

Upvotes: 1

Related Questions