David Mertz
David Mertz

Reputation: 323

Thread locking failing in dead-simple example

This is the simplest toy example. I know about concurrent.futures and higher level code; I'm picking the toy example because I'm teaching it (as part of same material with the high-level stuff).

It steps on counter from different threads, and I get... well, here it is even weirder. Usually I get a counter less than I should (e.g. 5M), generally much less like 20k. But as I decrease the number of loops, at some number like 1000 it is consistently right. Then at some intermediate number, I get almost right, occasionally correct, but once in a while slightly larger than the product of nthread x nloop. I am running it repeatedly in a Jupyter cell, but the first line really should reset counter to zero, not keep any old total.

lock = threading.Lock()
counter, nthread, nloop = 0, 100, 50_000 

def increment(n, lock):
    global counter
    for _ in range(n):
        lock.acquire()
        counter += 1
        lock.release()

for _ in range(nthread):
    t = Thread(target=increment, args=(nloop, lock))
    t.start()
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")

If I add .join() the behavior changes, but is still not correct. For example, in the version that doesn't try to lock:

counter, nthread, nloop = 0, 100, 50_000 

def increment(n):
    global counter
    for _ in range(n):
        counter += 1

for _ in range(nthread):
    t = Thread(target=increment, args=(nloop,))
    t.start()
    t.join()
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")
# --> 50,000 loops X 100 threads -> counter is 5,022,510

The exact overcount varies, but I see something like that repeatedly.

I don't really want to .join() in the lock example, because I want to illustrate the idea of a background job. But I can wait for the aliveness of the thread (thank you Frank Yellin!), and that fixes the lock case. The overcount still troubles me though.

Upvotes: 0

Views: 66

Answers (1)

Frank Yellin
Frank Yellin

Reputation: 11285

You're not waiting until all your threads are done before looking at counter. That's also why you're getting your result so quickly.

    threads = []
    for _ in range(nthread):
        t = threading.Thread(target=increment, args=(nloop, lock))
        t.start()
        threads.append(t)

    for thread in threads:
        thread.join()

    print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")

prints out the expected result.

50,000 loops X 100 threads -> counter is 5,000,000

Updated. I highly recommend using ThreadPoolExecutor() instead, which takes care of tracking the threads for you.

    with ThreadPoolExecutor() as executor:
        for _ in range(nthread):
            executor.submit(increment, nloop, lock)
    print(...)

will give you the answer you want, and takes care of waiting for the threads.

Upvotes: 2

Related Questions