Reputation: 408
I have a flask app (flask version 1.0.2) that processes xml documents. The documents are POSTed in json format, say like this:
import requests
import sys
import json
with open('big.xml', encoding="utf-8") as f:
xml_string = f.read()
print(sys.getsizeof(xml_string) // 1024 // 1024)
# 283
gid = "FOO"
json_data = json.dumps({"file_content": xml_string, "self_id": gid})
print(sys.getsizeof(json_data) // 1024 // 1024)
# 305
result_json = requests.post("http://my_server:8080/api", data=json_data, headers={"Content-Type": "application/json"})
As you can see, xml files can be quite big, around 300 MB in this example.
My flask app, to simplify, looks like this:
from flask import Flask, request, jsonify
from memory_profiler import profile
app = Flask(__name__)
@app.route('/api', methods=['POST'])
@profile
def api():
input_data = request.get_json()
output_data = {"id": "FOO"}
response = jsonify(output_data)
return response
if __name__ == "__main__":
app.run(host='0.0.0.0', port=8080, debug=True)
During the request being posted, the memory usage of the flask app spikes to ~2.8 GB. Memory profiling in the code above is nowhere close to these numbers:
Line # Mem usage Increment Line Contents
================================================
6 27.8 MiB 27.8 MiB @app.route('/api', methods=['POST'])
7 @profile
8 def api():
9 617.3 MiB 589.5 MiB input_data = request.get_json(request.data)
10 617.3 MiB 0.0 MiB output_data = {"id": "FOO"}
11 617.3 MiB 0.0 MiB response = jsonify(output_data)
12 617.3 MiB 0.0 MiB return response
What am I missing? What causes this big memory spike and how to deal with it?
Upvotes: 1
Views: 2814
Reputation: 10433
I guess you could safe lots of memory if you don't wrap the xml in a json structure and send extra information using headers for example.
functions like get_json
and jsonify
are convenient, but not optimized for low memory usage.. they probably copy the data before processing so it will be in memory multiple times.
i think you are doing something wrong .. the get_json
function in flask has the following signature: get_json(force=False, silent=False, cache=True)
you don't need to put your data in it because you are calling the function in the request object. Also you probably don't want to cache the result in memory for multiple calls.
try request.get_json(cache=False)
and I guess memory usage will go down by a few hundred MB.
additionally I think the json functions are known to use lots of memory: https://blog.ionelmc.ro/2015/11/22/memory-use-and-speed-of-json-parsers/
Upvotes: 1