Reputation: 888
I am working on a Java application for solving a class of numerical optimization problems - large-scale linear programming problems to be more precise. A single problem can be split up into smaller subproblems that can solved in parallel. Since there are more subproblems than CPU cores, I use an ExecutorService and define each subproblem as a Callable that gets submitted to the ExecutorService. Solving a subproblem requires calling a native library - a linear programming solver in this case.
Problem
I can run the application on Unix and on Windows systems with up to 44 physical cores and up to 256g memory, but computation times on Windows are an order of magnitude higher than on Linux for large problems. Windows not only requires substantially more memory, but CPU utilization over time drops from 25% in the beginning to 5% after a few hours. Here is a screenshot of the task manager in Windows:
Observations
What I've tried
Questions
Upvotes: 18
Views: 4056
Reputation: 9206
If you are endlessly starting and finishing new threads, this might be the reason. Reuse threads by means of pools, for example with FixedThreadPool:
ExecutorService executorService = Executors.newFixedThreadPool(10);
Future<String> future = executorService.submit(() -> "Hello World");
// some operations
String result = future.get();
Upvotes: 0
Reputation: 770
Would you please post the system statistics? Task manager is good enough to provide some clue if that is the only tool available. It can easily tell if your tasks are waiting for IO - which sounds like the culprit based on what you described. It may be due to certain memory management issue, or the library may write some temporary data to the disk, etc.
When you are saying 25% of CPU utilization, do you mean only a few cores are busy working at the same time? (It can be that all the cores works from time to time, but not simultaneously.) Would you check how many threads (or processes) are really created in the system? Is the number always bigger than the number of cores?
If there are enough threads, are many of them idle waiting for something? If true, you can try to interrupt (or attach a debugger) to see what they are waiting for.
Upvotes: 0
Reputation: 376
For Windows the number of threads per process is limited by the address space of the process (see also Mark Russinovich - Pushing the Limits of Windows: Processes and Threads). Think this causes side effects when it comes close to the limits (slow down of context switches, fragmentation...). For Windows I would try to divide the work load to a set of processes. For a similar issue that I had years ago I implemented a Java library to do this more conveniently (Java 8), have a look if you like: Library to spawn tasks in an external process.
Upvotes: 2
Reputation: 4838
I think this performance difference is due to how the O.S. manages the threads. JVM hide all OS difference. There are many sites where you can read about it, like this, for example. But it does not mean that the difference disappears.
I suppose you are running on Java 8+ JVM. Due to this fact, I suggest you to try to use stream and functional programming features. Functional programming is very usefully when you have many small independent problems and you want easily switch from sequential to parallel execution. The good news is that you don't have to define a policy to determine how many threads do you have to manage (like with the ExecutorService). Just for example (taken from here):
package com.mkyong.java8;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.IntStream;
import java.util.stream.Stream;
public class ParallelExample4 {
public static void main(String[] args) {
long count = Stream.iterate(0, n -> n + 1)
.limit(1_000_000)
//.parallel() with this 23s, without this 1m 10s
.filter(ParallelExample4::isPrime)
.peek(x -> System.out.format("%s\t", x))
.count();
System.out.println("\nTotal: " + count);
}
public static boolean isPrime(int number) {
if (number <= 1) return false;
return !IntStream.rangeClosed(2, number / 2).anyMatch(i -> number % i == 0);
}
}
Result:
For normal streams, it takes 1 minute 10 seconds. For parallel streams, it takes 23 seconds. P.S Tested with i7-7700, 16G RAM, WIndows 10
So, I suggest you read about function programming, stream, lambda function in Java and try to implement a small number of test with your code (adapted to work in this new context).
Upvotes: 0
Reputation: 670
Sounds like windows is caching some memory to pagefile, after its being untouched for some time, and thats why the CPU is bottlenecked by the Disk speed
You can verify it with Process explorer and check how much memory is cached
Upvotes: 0