Reputation: 187
In §7.5.1 of The Art of Multiprocessor Programming by Herlihy et al. (2nd ed., 2020), the authors present a simple lock that uses an array queue to achieve FIFO locking. Intuitively, the nth thread has a (thread-local) index into an array, and then spins on that array element until the n - 1 thread unlocks the lock. Its code looks like this:
public class ALock {
ThreadLocal<Integer> mySlotIndex = new ThreadLocal<>() {
@Override protected Integer initialValue() { return 0; }
};
AtomicInteger tail;
volatile boolean[] flag;
int size;
public ALock(int capacity) {
size = capacity;
tail = new AtomicInteger(0);
flag = new boolean[capacity];
flag[0] = true;
}
public void lock() {
int slot = tail.getAndIncrement() % size;
mySlotIndex.set(slot);
while (!flag[slot]) {};
}
public void unlock() {
int slot = mySlotIndex.get();
flag[slot] = false;
flag[(slot + 1) % size] = true;
}
}
I am using a minimal test program to check that this lock is fair. In a nutshell, I create NUM_THREADS
threads and map each one to an array index id
. Each thread tries to acquire the same lock. Once it succeeds, it increments a global COUNT
and also increments RUNS_PER_THREAD[id]
.
If the lock is correct, the final value of COUNT
should equal the sum of the values in RUNS_PER_THREAD
. If the lock is fair, the elements of RUNS_PER_THREAD
should be approximately equal.
public class Main {
static long COUNT = 0;
static int NUM_THREADS = 16;
// static Lock LOCK = new ReentrantLock(true);
static ALock LOCK = new ALock(NUM_THREADS);
static long[] RUNS_PER_THREAD = new long[NUM_THREADS];
static Map<Long, Integer> THREAD_IDS = new HashMap<>();
public static void main(String[] args) {
var threads = IntStream.range(0, NUM_THREADS).mapToObj(Main::makeWorker).toArray(Thread[]::new);
for (int i = 0; i < threads.length; i++) THREAD_IDS.put(threads[i].getId(), i);
for (var thread: threads) thread.start();
try { Thread.sleep(300L); } catch (InterruptedException e) {}
for (var thread: threads) thread.interrupt();
try { Thread.sleep(100L); } catch (InterruptedException e) {}
for (int i = 0; i < NUM_THREADS; i++) System.out.printf("Thread %d:\t%12d%n", i, RUNS_PER_THREAD[i]);
System.out.println("Counted up to: \t\t\t" + COUNT);
System.out.println("Sum for all threads: \t" + Arrays.stream(RUNS_PER_THREAD).sum());
}
private static Thread makeWorker(int i) {
return new Thread(() -> {
while (true) {
if (Thread.interrupted()) return;
LOCK.lock();
try {
COUNT++;
var id = THREAD_IDS.get(Thread.currentThread().getId());
RUNS_PER_THREAD[id]++;
} finally {
LOCK.unlock();
}}});
}
}
If the test program is run with a fair ReentrantLock
, the final count of runs per thread with 16 threads (on my M1 Max Mac with Java 17) is almost exactly equal. If the same test is run with ALock
, the first few threads seem to acquire the lock approximately 10 times more frequently than the last few threads.
Is ALock
, as presented, unfair, and if so, why? Alternatively, is my minimal test flawed, and if so, why does it seem to demonstrate the fairness of ReentrantLock
?
Upvotes: 0
Views: 165
Reputation: 15116
Your test code has non-threadsafe update for COUNT++
. Switch to COUNT.incrementAndGet()
and:
static AtomicLong COUNT = new AtomicLong();
ALock
will give unfair results especially when number of threads exceeds CPUs. The implementation relies on high CPU spin loop while (!flag[slot])
and not all threads are having same opportunity to enter their lock spin-loops - the first few threads are performing more of the lock-unlock
cycles. Adding Thread.yield
should balance out the thread access to the boolean array so all threads have similar opportunities to run through their own lock spin loop.
while (!flag[slot]) {
Thread.yield();
}
You should see different results if you try setting NUM_THREADS
to be same or less than Runtime.getRuntime().availableProcessors()
- the use of Thread.yield()
may not make a difference compared to when NUM_THREADS > Runtime.getRuntime().availableProcessors()
.
Using this lock class will lead to slower throughput as at any one time up to N-1
threads are in high CPU spin loop waiting for the current locking thread to call unlock()
. In ideal lock implementations, N-1
waiters won't be consuming CPU.
The ALock
locking stategy will only work if the exact same number of threads is used as provided new ALock(NUM_THREADS)
because otherwise the use of int slot = tail.getAndIncrement() % size;
may result in 2 threads reading from the same slot
.
Note that any code relying on spin loop or Thread.yield()
to work is not an effective implementation and should not be used in production code. Both can be avoided with the classes of java.util.concurrent.*
.
Upvotes: 1