Reputation: 106371
I am fine-turning a numerical algorithm in pure Java that is quite sensitive to the size of the per-core processor cache: it runs noticeably faster when the working data set fits within L1 cache.
Obviously I can fine tune this for my local machine with a bit of benchmarking. But ideally I'd like to be able to adjust the size of the working set automatically, based on the size of the L1 cache of the processor being used.
Native code is not an option: the whole point of writing this algorithm in Java is to make it platform independent!
Is there a good way to reliably determine the size of the per-core cache in pure Java?
Upvotes: 2
Views: 1248
Reputation: 11
public class CacheLine {
public static void main(String[] args) {
CacheLine cacheLine = new CacheLine();
cacheLine.startTesting();
}
private void startTesting() {
byte[] array = new byte[128 * 1024];
for (int testIndex = 0; testIndex < 10; testIndex++) {
testMethod(array);
System.out.println("--------- // ---------");
}
}
private void testMethod(byte[] array) {
for (int len = 8192; len <= array.length; len += 8192) {
long t0 = System.nanoTime();
for (int i = 0; i < 10000; i++) {
for (int k = 0; k < len; k += 64) {
array[k] = 1;
}
}
long dT = System.nanoTime() - t0;
System.out.println("len: " + len/1024 + " dT: " + dT + " dT/stepCount: " + (dT) / len);
}
}
}
This code helps you with determining L1 data cache size. You can read about it more in detail here. https://medium.com/@behzodbekqodirov/threading-in-java-194b7db6c1de#.kzt4w8eul
Upvotes: 0
Reputation: 1502006
If it runs noticeably faster with one set of parameters than another, then I would adjust it based on noticing that difference. Before you start doing a long set of calculations (which I assume is the case, otherwise you wouldn't care), run smaller sets with various different sizes of internal data store. (I'm assuming the algorithm can just be adjusted numerically like that.)
That way it doesn't really matter whether the difference comes from the L1 cache size, or perhaps the L1 + L2 cache size, or something else entirely - you'll pick whatever's best for the situation at hand.
You'll need to be careful of JIT warm-up periods, just as in normal benchmarking, but I think this is a good way of creating a general optimization approach, even if it ends up happening to take account of the L1 cache most heavily.
You could potentially have this as a separate install-time piece of work, which writes the results to a configuration file, so that on subsequent runs you can avoid the extra work. (You'd probably want a way of rerunning the tuning step, in case the processor changes or whatever.)
Upvotes: 5