Vilmar
Vilmar

Reputation: 408

Flask memory spike when processing big json request

I have a flask app (flask version 1.0.2) that processes xml documents. The documents are POSTed in json format, say like this:

import requests
import sys
import json
with open('big.xml', encoding="utf-8") as f:
    xml_string = f.read()
    print(sys.getsizeof(xml_string) // 1024 // 1024)
    # 283
    gid = "FOO"
    json_data = json.dumps({"file_content": xml_string, "self_id": gid})
    print(sys.getsizeof(json_data) // 1024 // 1024)
    # 305
    result_json = requests.post("http://my_server:8080/api", data=json_data, headers={"Content-Type": "application/json"})

As you can see, xml files can be quite big, around 300 MB in this example.

My flask app, to simplify, looks like this:

from flask import Flask, request, jsonify
from memory_profiler import profile

app = Flask(__name__)

@app.route('/api', methods=['POST'])
@profile
def api():
    input_data = request.get_json()
    output_data = {"id": "FOO"}
    response = jsonify(output_data)
    return response

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8080, debug=True)

During the request being posted, the memory usage of the flask app spikes to ~2.8 GB. Memory profiling in the code above is nowhere close to these numbers:

Line #    Mem usage    Increment   Line Contents
================================================
     6     27.8 MiB     27.8 MiB   @app.route('/api', methods=['POST'])
     7                             @profile
     8                             def api():
     9    617.3 MiB    589.5 MiB       input_data = request.get_json(request.data)
    10    617.3 MiB      0.0 MiB       output_data = {"id": "FOO"}
    11    617.3 MiB      0.0 MiB       response = jsonify(output_data)
    12    617.3 MiB      0.0 MiB       return response

What am I missing? What causes this big memory spike and how to deal with it?

Upvotes: 1

Views: 2814

Answers (1)

Bastian
Bastian

Reputation: 10433

I guess you could safe lots of memory if you don't wrap the xml in a json structure and send extra information using headers for example.

functions like get_json and jsonify are convenient, but not optimized for low memory usage.. they probably copy the data before processing so it will be in memory multiple times.

i think you are doing something wrong .. the get_jsonfunction in flask has the following signature: get_json(force=False, silent=False, cache=True) you don't need to put your data in it because you are calling the function in the request object. Also you probably don't want to cache the result in memory for multiple calls.

try request.get_json(cache=False) and I guess memory usage will go down by a few hundred MB.

additionally I think the json functions are known to use lots of memory: https://blog.ionelmc.ro/2015/11/22/memory-use-and-speed-of-json-parsers/

Upvotes: 1

Related Questions