christophe
christophe

Reputation: 25

Flask-socketio misses events while copying file in background thread

(Complete test app on github: https://github.com/olingerc/socketio-copy-large-file)

I am using Flask together with the Flask-SocketIO plugin. My clients can ask the server to copy files via websocket but while the files are copying, I want the clients to be able to communicate with the server to ask it to do other things. My solution is to run the copy process (shutil) in a background thread. This is the function:

def copy_large_file():
    source = "/home/christophe/Desktop/largefile"
    destination = "/home/christophe/Desktop/largefile2"
    try:
        os.remove(destination)
    except:
        pass
    print("Before copy")
    socketio.emit('my_response',
                  {'data': 'Thread says: before'}, namespace='/test')
    shutil.copy(source, destination)
    print("After copy")
    socketio.emit('my_response',
                  {'data': 'Thread says: after'}, namespace='/test')

I observe the following behavior: When starting the function using the native socketio method:

socketio.start_background_task(target=copy_large_file)

all incoming events while a large file is being copied are delayed until the file is finished and a next file is started. I gues shutil is not relasing the GIL or something like that, so I tested with threading:

thread = threading.Thread(target=copy_large_file)
thread.start()

same behaviour. Maybe multiprocessing?

thread = multiprocessing.Process(target=copy_large_file)
thread.start()

Ah! That works and signals emitted via socketio within the copy_large_file function are correctly received. BUT: If a user starts to copy a very large file, closes their browser and comes back 2 minutes later, the socket no longer connects to the same socketio "session?" and thus no longer receives messages emitted from the background process.

I guess the main question is: How can I copy large files in the background without blocking flask-socketio but still being able to emit signals to the client from within the background process.

The test app can be used to reproduce the behaviour:

In the browser:

Upvotes: 1

Views: 1168

Answers (1)

Miguel Grinberg
Miguel Grinberg

Reputation: 67489

You are asking two separate questions.

First, let's discuss the actual copying of the file.

It looks like you are using eventlet for your server. While this framework provides asynchronous replacements for network I/O functions, disk I/O is much more complicated to do in a non-blocking fashion, in particular on Linux (some info on the problem here). So doing I/O on files even with the standard library monkey patched causes blocking, as you have noticed. This is the same with gevent, by the way.

A typical solution to perform non-blocking I/O on files is to use a thread pool. With eventlet, the eventlet.tpool.execute function can do this. So basically, instead of calling copy_large_file() directly, you will call tpool.execute(copy_large_file). This will enable other green threads in your application to run while the copy takes place in another system thread. Your solution of using another process is also valid, by the way, but it may be overkill depending on how many times and how frequently you need to do one of these copies.

Your second question is related to "remembering" a client that starts a long file copy, even if the browser is closed and reopened.

This is really something your application needs to handle by storing the state that is necessary to restore a returning client. Presumably your clients have a way to identify with your application, either with a token or some other identification. When the server starts one of these file copies, it can assign an id to the operation, and store that id in a database, associated with the client that requested it. If the client goes away and then returns, you can find if there are any ongoing file copies for it, and that way sync the client back to the way it was before it closed the browser.

Hope this helps!

Upvotes: 2

Related Questions