Accessing a variable assigned in multiprocessed function

Question

I'm making a website, and on startup, I want to launch another process that starts loading an embedding model because this takes a long time and will be needed by the user eventually. This is my code:

from flask import Flask, render_template
from flask_socketio import SocketIO, send
import bot
import sys
sys.path = sys.path + ['filepath']
from BigLearnPy import BigLearn
from multiprocessing import Process

app = Flask(__name__)
app.config['SECRET_KEY'] = 'password'
socketio = SocketIO(app)

def loadModel():
    BigLearn.LoadEmbeddingEngine()
    emb = BigLearn.EmbeddingEngine('filepath')

@app.route('/')
def index():
    return render_template('index.html')

@socketio.on('message')
def handleMessage(msg):
    send(msg, broadcast=True)
    p1.join()
    send('0' + bot.getResponse(msg, emb), broadcast=True)
    send('2' + bot.getKB(msg, emb), broadcast=True)
if __name__ == '__main__':
    emb = None
    p1 = Process(target=loadModel)
    p1.start()
    socketio.run(app)

I start the process to load the model right before I start running the app (penultimate line). I join the process in the handleMessage function right before I need the value of emb. So that I can access emb outside of the loadModel function, I declared it right before creating the process. However, when I run the code, I get an error saying emb is a NoneType object. This seems like a scoping issue but no matter where I sayemb = None, I either get that emb is None or undefined when I try to use it. How can I load the model in a different process then access the model? Thanks.

jbch · Accepted Answer

You cannot load the model from a different process. That is not how multi-processing works.

At the fork, each process get its own copy of memory (conceptually; in practice there are tricks to prevent copying everything). Any change in variables after the fork will only be visible in the process that changed it, not in its parent.

If you want to share memory you need to use threads, not processes. But mutating memory that is shared between threads in a safe way is fairly complicated. In any case it might not help you that much because Python has a Global Interpreter Lock: only one Python thread can run at a time.

If you want to experiment with threads or processes I would recommend starting with simpler examples.

As for your problem, I would start by trying to optimize the loading code so it is faster. Without knowing what it does it is hard to make more specific suggestions.

Accessing a variable assigned in multiprocessed function

Answers (1)

Related Questions