jia Jimmy
jia Jimmy

Reputation: 1848

Is it right to append item into the same list by multi-thread without lock?

Hrere's the detail question:

I want use multi-thread way to do a batch-http-request work, then gather all these result into a list and sort all items.

So I want to define a empty list origin_list in main process first, and start some threads to just append result into this list after pass origin_list to ervery thread.

And It seemed that I got the expected results in then end, so I think I got the right result list finally without thread lock for the list is a mutable object, am I right?

My main codes are as below:

def do_request_work(final_item_list,request_url):
    request_results = request.get(request_url).text
    # do request work
    finnal_item_list.append(request_results )


def do_sort_work(final_item_list):
    # do sort work 
    return final_item_list


def main():

    f_item_list = []
    request_list = [url1, url2, ...]

    with ThreadPoolExecutor(max_workers=20) as executor:
        executor.map(
            partial(
                do_request_work,
                f_item_list
                ),
            request_list)

    sorted_list = do_sort_work(f_item_list)

Any commentary is very welcome. great thanks.

Upvotes: 1

Views: 1027

Answers (2)

Christo Goosen
Christo Goosen

Reputation: 586

Look at this thread: I'm seeking advise on multi-tasking on Python36 platform, Procedure setup.

Relevant to python3.5+

Running Tasks Concurrently¶
awaitable asyncio.gather(*aws, loop=None, return_exceptions=False)
Run awaitable objects in the aws sequence concurrently.

I use this very often, just be aware that its not thread-safe, so do not change values inside, otherwise you will have use deepcopy.

Other things to look at:

Upvotes: 0

ololobus
ololobus

Reputation: 4098

I think, that this is a quite questionable solution even without taking thread safety into account.

First of all python has GIL, which

In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.

Thus, I doubt about much performance benefit here, even noting that

potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL.

all python work will be executed one thread in a time.

From the other perspective, the same lock may help you with the thread safety here, so only one thread will modify final_item_list in a time, but I am not sure.

Anyway, I would use multiprocessing module here with integrated parallel map:

from multiprocessing import Pool

def do_request_work(request_url):
    request_results = request.get(request_url).text
    # do request work
    return request_results

if __name__ == '__main__':
    request_list = [url1, url2, ...]

    with Pool(20) as p:
        f_item_list = p.map(do_request_work, request_list)

Which will guarantee you parallel lock-free execution of requests, since every process will receive only their part of work and just return the result, when ready.

Upvotes: 1

Related Questions