Reputation: 121
I wrote the following function and tested it out in python shell and the images were downloaded successfully, however, when I ran it in a script no images were downloaded.
import os
import requests
from time import time
import uuid
from multiprocessing.pool import ThreadPool
main_file_name = 'test1.csv'
my_set = set()
with open(main_file_name, 'r') as f: #read image urls
for row in f:
my_set.add(row.split(',')[2].strip())
def get_url(entry):
path = str(uuid.uuid4()) + ".jpg"
if not os.path.exists(path):
r = requests.get(entry, stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
for chunk in r:
f.write(chunk)
start = time()
results = ThreadPool(8).imap_unordered(get_url, my_set)
print(f"Elapsed Time: {time() - start}")
I double-checked and it works in shell, is there anything I am missing from the script
Upvotes: 0
Views: 508
Reputation: 1487
"results" is of class multiprocessing.pool.IMapUnorderedIterator
, a good way to make sure the URLs are downloaded is to actually loop on results
start = time()
results = ThreadPool(8).imap_unordered(fetch_url, my_set)
for _ in results:
pass
print(f"Elapsed Time: {time() - start}")
Another method which will also do the trick is to make sure the main thread completes before exiting from the script is to use time.sleep
from time import sleep
start = time()
results = ThreadPool(8).imap_unordered(fetch_url, my_set)
sleep(10) # make sure this amount is enough to finish downloading
print(f"Elapsed Time: {time() - start}")
The reason your script doesn't work is that you immediately end the script after your initiation of results
.The reason python3 -i test.py
(or simply copy-paste your code in the shell) works is because the script was not killed (main thread lives) so the images had the time to be downloaded.
Upvotes: 1