Reputation: 25454
Is there an ExecutorService that is suitable for a huge amount of very short-lived tasks? I envision something that internally tries busy waiting before switching over to synchronized waiting. Keeping the order of the tasks is not important, but it should be possible to enforce memory consistency (all tasks happen-before the main thread regains control).
The test posted below consists of 100'000 tasks that each generate 100 double
s in a row. It accepts the size of the thread pool as command-line parameter and always tests the serial version vs. the parallel one. (If no command-line arg is given, only the serial version is tested.) The parallel version uses a thread pool of fixed size, allocation of the tasks is not even part of the time measurement. Still, the parallel version is never faster than the serial version, I've tried up to 80 threads (on a machine with 40 hyperthreaded cores). Why?
import java.util.ArrayList;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ExecutorPerfTest {
public static final int TASKS = 100000;
public static final int SUBTASKS = 100;
static final ThreadLocal<Random> R = new ThreadLocal<Random>() {
@Override
protected synchronized Random initialValue() {
return new Random();
}
};
public class SeqTest implements Runnable {
@Override
public void run() {
Random r = R.get();
for (int i = 0; i < TASKS; i++)
for (int j = 0; j < SUBTASKS; j++)
r.nextDouble();
}
}
public class ExecutorTest implements Runnable {
private final class RandomGenerating implements Callable<Double> {
@Override
public Double call() {
double d = 0;
Random r = R.get();
for (int j = 0; j < SUBTASKS; j++)
d = r.nextDouble();
return d;
}
}
private final ExecutorService threadPool;
private ArrayList<Callable<Double>> tasks = new ArrayList<Callable<Double>>(TASKS);
public ExecutorTest(int nThreads) {
threadPool = Executors.newFixedThreadPool(nThreads);
for (int i = 0; i < TASKS; i++)
tasks.add(new RandomGenerating());
}
public void run() {
try {
threadPool.invokeAll(tasks);
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
threadPool.shutdown();
}
}
}
public static void main(String[] args) {
ExecutorPerfTest executorPerfTest = new ExecutorPerfTest();
if (args.length > 0)
executorPerfTest.start(new String[]{});
executorPerfTest.start(args);
}
private void start(String[] args) {
final Runnable r;
if (args.length == 0) {
r = new SeqTest();
}
else {
final int nThreads = Integer.parseInt(args[0]);
r = new ExecutorTest(nThreads);
}
System.out.printf("Starting\n");
long t = System.nanoTime();
r.run();
long dt = System.nanoTime() - t;
System.out.printf("Time: %.6fms\n", 1e-6 * dt);
}
}
Upvotes: 4
Views: 2544
Reputation: 1245
The call to Executors.newFixedThreadPool(nThreads)
will create a ThreadPoolExecutor
that reads tasks from a LinkedBlockingQueue
, ie. all threads in the executor will lock on the same queue to retrieve the next task.
Given the very small size of each task and the relatively large number of threads/cpus that you are quoting, it's most likely that your program is running slowly because of the high degree of lock contention and context switching that will be occurring.
Note that the implementation of the ReentrantLock
used by LinkedBlockingQueue
already spins for short periods (up to approximately 1us) while trying to acquire the lock before the thread gives up and blocks.
If your use case permits then you might want to try using the Disruptor pattern instead, see http://lmax-exchange.github.com/disruptor/
Upvotes: 2