Reputation: 14596
I have a function posting to a server (AWS Lambda) to perform OCR of an base64 image:
def image_to_text(image64):
url = base_url + 'text-to-image'
data = json.dumps({'image64': image64})
r = requests.post(url, data)
r.raise_for_status()
return r.json()['text'].encode('utf-8')
The function works: image_to_text('some long string')
will return a proper response.
Problem: Using image_to_text
in parallel (multi-process) makes the application halt (without warnings or errors) at r = request.post(url, data)
Example:
import multiprocessing as mp
from multiprocessing import cpu_count
p = mp.Pool(cpu_count())
p.map(image_to_text, ('A long string',
'Another long string'))
p.terminate()
Question: Why is my application halting and how can I use multiprocessing
to fetch data with requests?
Upvotes: 1
Views: 540
Reputation: 15030
Your application probably is not halting but it's slowing down significantly due to the large amount of data you are transferring through the Pool
. What you perceive as a "halt" is actually big IPC overhead.
From the multiprocessing
programming guidelines.
Avoid shared state
As far as possible one should try to avoid shifting large amounts of data between processes.
The Pool
relies on an internal pipe to transfer the data to the workers which will execute your image_to_text
function. This pipe becomes a bottleneck if the amount of data which needs to be delivered is huge. In your case, you are sending the data back and forth doubling the amount of bytes which need to be serialized and shipped.
I'd recommend you to dump the data to temporary files and send to image_to_text
the file names only. The image_to_text
will open and read the data from the files autonomously. You will notice your logic becoming significantly faster and more robust as well.
Upvotes: 1