Reputation: 13
We are having a python script which automates the batch processing of time-series image data downloaded from the internet. The current script requires all data to be downloaded before execution. This consumes more time. We want to modify the script by writing a scheduler which will call the script whenever a single data is completely downloaded. How to find that a file has been downloaded completely using python?
Upvotes: 0
Views: 666
Reputation: 6227
If you download the file with Python, then you can just do the image processing operation after the file download operation finishes. An example using requests:
import requests
import mymodule # The module containing your custom image-processing function
for img in ("foo.png", "bar.png", "baz.png"):
response = requests.get("http://www.example.com/" + img)
image_bytes = response.content
mymodule.process_image(image_bytes)
However, with the sequential approach above you will be spending a lot of time waiting for responses from the remote server. To make this faster, you can download and process multiple files at once using aysncio and aiohttp. There's a good introduction to downloading files this way in Paweł Miech's blog post Making 1 million requests with python-aiohttp. The code you need will look something like the example at the bottom of that blog post (the one with the semaphore).
Upvotes: 1