Reputation: 1848
Hrere's the detail question:
I want use multi-thread way to do a batch-http-request work, then gather all these result into a list and sort all items.
So I want to define a empty list origin_list
in main process first, and start some threads to just append result into this list after pass origin_list
to ervery thread.
And It seemed that I got the expected results in then end, so I think I got the right result list finally without thread lock for the list is a mutable object, am I right?
My main codes are as below:
def do_request_work(final_item_list,request_url):
request_results = request.get(request_url).text
# do request work
finnal_item_list.append(request_results )
def do_sort_work(final_item_list):
# do sort work
return final_item_list
def main():
f_item_list = []
request_list = [url1, url2, ...]
with ThreadPoolExecutor(max_workers=20) as executor:
executor.map(
partial(
do_request_work,
f_item_list
),
request_list)
sorted_list = do_sort_work(f_item_list)
Any commentary is very welcome. great thanks.
Upvotes: 1
Views: 1027
Reputation: 586
Look at this thread: I'm seeking advise on multi-tasking on Python36 platform, Procedure setup.
Relevant to python3.5+
Running Tasks Concurrently¶
awaitable asyncio.gather(*aws, loop=None, return_exceptions=False)
Run awaitable objects in the aws sequence concurrently.
I use this very often, just be aware that its not thread-safe, so do not change values inside, otherwise you will have use deepcopy.
Other things to look at:
Upvotes: 0
Reputation: 4098
I think, that this is a quite questionable solution even without taking thread safety into account.
First of all python
has GIL, which
In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.
Thus, I doubt about much performance benefit here, even noting that
potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL.
all python
work will be executed one thread in a time.
From the other perspective, the same lock may help you with the thread safety here, so only one thread will modify final_item_list
in a time, but I am not sure.
Anyway, I would use multiprocessing
module here with integrated parallel map
:
from multiprocessing import Pool
def do_request_work(request_url):
request_results = request.get(request_url).text
# do request work
return request_results
if __name__ == '__main__':
request_list = [url1, url2, ...]
with Pool(20) as p:
f_item_list = p.map(do_request_work, request_list)
Which will guarantee you parallel lock-free execution of requests, since every process will receive only their part of work and just return the result, when ready.
Upvotes: 1