pyarrow serialized object over http

Question

I'm storing pandas dataframes in Redis serialising them using pyarrow. This is working well. I want to make this data available to Jupyter notebooks via flask. This works fine on localhost but fails when running on AWS EB.

Flask code

@app.route('/cacheget/', methods=['GET'])
def cacheget(key):
    c = mycache()
    resp = Response(BytesIO(c.redis().get(key)), mimetype="text/plain", direct_passthrough=True)
    resp.headers["key"] = key
    resp.headers["type"] = c.redis().get(f"{key}.type")
    return resp

Jupyter tests to flask running on localhost and AWS EB

I suspect there is an issue with bytes content being incomplete when pyarrow deserialises it. However I cannot see or find any evidence or find any other posts which are related to this. I am considering switching from pyarrow serialised data on the wire to JSON. i.e. in flask route convert the serialised bytes to pandas and then to json. This however will be at least 10x bigger on the wire.

Are my http headers correctly set for this? Are there any known issues with sending bytes data like this over the wire?

pyarrow serialized object over http

Answers (1)

Related Questions