mikera
mikera

Reputation: 106351

Optimizing processing and management of large Java data arrays

I'm writing some pretty CPU-intensive, concurrent numerical code that will process large amounts of data stored in Java arrays (e.g. lots of double[100000]s). Some of the algorithms might run millions of times over several days so getting maximum steady-state performance is a high priority.

In essence, each algorithm is a Java object that has an method API something like:

   public double[] runMyAlgorithm(double[] inputData);

or alternatively a reference could be passed to the array to store the output data:

   public runMyAlgorithm(double[] inputData, double[] outputData);

Given this requirement, I'm trying to determine the optimal strategy for allocating / managing array space. Frequently the algorithms will need large amounts of temporary storage space. They will also take large arrays as input and create large arrays as output.

Among the options I am considering are:

Any thoughts on which approach would be best (and why)?

Upvotes: 1

Views: 2190

Answers (2)

Mihai Toader
Mihai Toader

Reputation: 12243

What I have noticed when working with memory in Java is the following. If your memory needs patterns are simple (mostly 2-3 types of memory allocations) you can usually be better than the default allocator. You can either preallocate a pool of buffers at the application startup and use them as needed or go to the other route (allocate an huge array at the beginning and provide pieces of that when needed). In effect you are writing your own memory allocator. But chances are you will do a worse job than the default allocator of Java.

I would probably try to do the following: standardize the buffer sizes and allocate normally. That way after a while the only memory allocation/deallocation will be in fixed sizes which will greatly help the garbage collector to run fast. Another thing I would do is to make sure at the algorithm design time that the total memory needed at any one point will not exceed something like 80-85% of the memory of the machine in order to not trigger a full collection inadvertently.

Apart from those heuristics I would probably test the hell of any solution I would pick and see how it works in practice.

Upvotes: 2

Peter Lawrey
Peter Lawrey

Reputation: 533442

Allocating big arrays is relatively cheap for the GC. You tend to use you your Eden space quickly, but the cost is largely per object. I suggest you write the code in the simplest manner possible and optimise it later after profiling the application. a double[100000] is less than a MB and you can over a thousand in a GB.

Memory is a lot cheaper than it used to be. An 8 GB server costs about £850. A 24 GB server costs about £1,800. (a 24 GB machine could allow you 24K x double[100000]) You may find using a large heap size or even a large Eden size gives you the efficiency you want.

Upvotes: 2

Related Questions