Is it right to append item into the same list by multi-thread without lock?

Question

Hrere's the detail question:

I want use multi-thread way to do a batch-http-request work, then gather all these result into a list and sort all items.

So I want to define a empty list origin_list in main process first, and start some threads to just append result into this list after pass origin_list to ervery thread.

And It seemed that I got the expected results in then end, so I think I got the right result list finally without thread lock for the list is a mutable object, am I right?

My main codes are as below:

def do_request_work(final_item_list,request_url):
    request_results = request.get(request_url).text
    # do request work
    finnal_item_list.append(request_results )


def do_sort_work(final_item_list):
    # do sort work 
    return final_item_list


def main():

    f_item_list = []
    request_list = [url1, url2, ...]

    with ThreadPoolExecutor(max_workers=20) as executor:
        executor.map(
            partial(
                do_request_work,
                f_item_list
                ),
            request_list)

    sorted_list = do_sort_work(f_item_list)

Any commentary is very welcome. great thanks.

ololobus · Accepted Answer

I think, that this is a quite questionable solution even without taking thread safety into account.

First of all python has GIL, which

In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.

Thus, I doubt about much performance benefit here, even noting that

potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL.

all python work will be executed one thread in a time.

From the other perspective, the same lock may help you with the thread safety here, so only one thread will modify final_item_list in a time, but I am not sure.

Anyway, I would use multiprocessing module here with integrated parallel map:

from multiprocessing import Pool

def do_request_work(request_url):
    request_results = request.get(request_url).text
    # do request work
    return request_results

if __name__ == '__main__':
    request_list = [url1, url2, ...]

    with Pool(20) as p:
        f_item_list = p.map(do_request_work, request_list)

Which will guarantee you parallel lock-free execution of requests, since every process will receive only their part of work and just return the result, when ready.

Is it right to append item into the same list by multi-thread without lock?

Answers (2)

Related Questions