Reputation: 365
I have a performance situation at hand.
I have a huge amount of data to be held in memory in a two dimensional table format (12000 X 2000). Now as far as my knowledge goes either I can use int[][]
or List<List<Integer>>
. And, of course, I access the values using int[i][j]
or list.get(i).get(j)
. I am looping through the entire data at least five times.
Which one do you think will work faster and, if you can answer, why? Also is there any way to speed up the execution?
My java -version
gives:
java version "1.6.0_29"
Java(TM) SE Runtime Environment (build 1.6.0_29-b11)
Java HotSpot(TM) Client VM (build 20.4-b02, mixed mode, sharing)
The OS is Windows Vista.
Upvotes: 1
Views: 6461
Reputation:
1) Benchmark your application as a whole. Don't assume that you know where the perf bottlenecks in your application are. Experience shows again and again and again that humans generally suck at this. Do this on hardware and systems which are identical to production, or you're wasting your time.
2) Don't forget to structure your benchmark in such a way that the JIT compiler has kicked in for the code you care about. 10000 iterations of a method are typically needed before a method is compiled. Benchmarking interpreted-mode code is a total waste of time.
3) In an application where the most significant bottlenecks have been dealt with, many applications will be in a state where the performance profile is dominated by the number of processor L1 cache misses. You can regard this as being the point at which your application is reasonably well-tuned. Your algorithms may still suck however, and there may still be loads of busywork going on in the system that you can get rid of.
4) Assuming that your algorithms don't suck and that you have no major chunks of busywork that you can get rid of, if the array / List difference is truly significant for you then it's at this point that you'll start to see it in the perf numbers.
5) Under most circumstances, you will find that the L1 cache situation will be better for arrays than for lists. However, this is general advice, not to be mistaken for actual performance tuning advice. Generate your own perf numbers and analyse them.
tl;dr version: Read the long version. tl;dr has no place in Java performance discussion - this is subtle and complex stuff and the nuances matter.
Upvotes: 2
Reputation: 3508
Here is a simple benchmark that shows the primitive arrays to be much faster. The cost of boxing will make arrays slower though.
Results:
Results summary:
Geo. Mean Primitive Array time: 0.7010723914083877 ms
Geo. Mean Boxed Array time: 2.517326382701606 ms
Geo. Mean ArrayList time: 1.1690484729741475 ms
Geo. Mean LinkedList time: 2.3522075667709146 ms
Code:
import java.lang.ref.WeakReference;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
/**
* User: shams
* Date: 11/23/11
* Time: 9:30 AM
*/
public class Benchmark {
public static void main(String[] args) {
final int ROW_SIZE = 1200;
final int COL_SIZE = 200;
final int numIterations = 10;
final List<Double> arrayPrimitiveTimes = new LinkedList<Double>();
final List<Double> arrayBoxedTimes = new LinkedList<Double>();
final List<Double> linkedListTimes = new LinkedList<Double>();
final List<Double> arrayListTimes = new LinkedList<Double>();
for (int i = 0; i < numIterations; i++) {
{
tryGarbageCollection();
startReportingTime();
final int[][] dataArray = new int[ROW_SIZE][COL_SIZE];
runPrimitiveArrayCode(dataArray);
arrayPrimitiveTimes.add(endReportingTime("Primitive Array time: "));
}
{
tryGarbageCollection();
startReportingTime();
final Integer[][] dataArray = new Integer[ROW_SIZE][COL_SIZE];
runBoxedArrayCode(dataArray);
arrayBoxedTimes.add(endReportingTime("Boxed Array time: "));
}
{
tryGarbageCollection();
startReportingTime();
final List<List<Integer>> arrayList = new ArrayList<List<Integer>>(ROW_SIZE);
for (int r = 0; r < ROW_SIZE; r++) {
arrayList.add(new ArrayList<Integer>(COL_SIZE));
}
runListCode(arrayList);
arrayListTimes.add(endReportingTime("ArrayList time: "));
}
{
tryGarbageCollection();
startReportingTime();
final List<List<Integer>> arrayList = new LinkedList<List<Integer>>();
for (int r = 0; r < ROW_SIZE; r++) {
arrayList.add(new LinkedList<Integer>());
}
runListCode(arrayList);
linkedListTimes.add(endReportingTime("LinkedList time: "));
}
}
System.out.println("\n\n Results summary: ");
printResult("Geo. Mean Primitive Array time: ", getMiddleGeoMeanTime(arrayPrimitiveTimes));
printResult("Geo. Mean Boxed Array time: ", getMiddleGeoMeanTime(arrayBoxedTimes));
printResult("Geo. Mean ArrayList time: ", getMiddleGeoMeanTime(arrayListTimes));
printResult("Geo. Mean LinkedList time: ", getMiddleGeoMeanTime(linkedListTimes));
}
private static void runPrimitiveArrayCode(final int[][] dataArray) {
for (int i = 0; i < dataArray.length; i++) {
int[] cached = dataArray[i];
for (int j = 0; j < cached.length; j++) {
cached[j] = cached[j] + i + j;
}
}
}
private static void runBoxedArrayCode(final Integer[][] dataArray) {
for (int i = 0; i < dataArray.length; i++) {
Integer[] cached = dataArray[i];
for (int j = 0; j < cached.length; j++) {
Integer oldData = cached[j]; // dummy read
cached[j] = i + j + (oldData == null ? 0 : 1);
}
}
}
private static void runListCode(final List<List<Integer>> dataArray) {
for (int i = 0; i < dataArray.size(); i++) {
final List<Integer> cached = dataArray.get(i);
for (int j = 0; j < cached.size(); j++) {
cached.set(j, cached.get(j) + i + j);
}
}
}
public static void tryGarbageCollection() {
int count = 0;
int limit = 2;
while (count < limit) {
count += 1;
// println("enforceGarbageCollection: starting enforce of GC")
int attempts = 0;
WeakReference<Object> wr = new WeakReference<Object>(new Object());
while (wr.get() != null && attempts < 25) {
// add some delay
int busy = 0;
while (busy < 100) {
busy += 1;
wr.get();
}
new Object();
System.out.print(".");
System.gc();
attempts += 1;
}
// println("enforceGarbageCollection: done GC")
}
}
private static long startTime = 0;
public static void startReportingTime() {
startTime = System.nanoTime();
}
public static double endReportingTime(String msg) {
long newTime = System.nanoTime();
double execTime = (newTime - startTime) / 1e6;
System.out.println(msg + execTime + "ms");
return execTime;
}
public static double getBestTime(List data) {
if (data.isEmpty()) {
return 0;
} else {
java.util.Collections.sort(data);
return ((Double) data.get(0)).doubleValue();
}
}
public static double getMiddleGeoMeanTime(List<Double> data) {
java.util.Collections.sort(data);
List<Double> sortedResult = data;
double midValuesProduct = 1.0;
int midValuesCount = 0;
for (int i = 1; i < sortedResult.size() - 1; i++) {
midValuesCount += 1;
midValuesProduct *= sortedResult.get(i).doubleValue();
}
final double average;
if (midValuesCount > 0) {
average = Math.pow(midValuesProduct, 1.0 / midValuesCount);
} else {
average = 0.0;
}
return average;
}
public static void printResult(String msg, double timeInMs) {
System.out.println(msg + " " + timeInMs + " ms");
}
}
Upvotes: 1
Reputation: 35341
It depends on the List
implementation you are using. If you are using an ArrayList
(the one most people use), then performance is going to be essentially identical to an array. But if you are using a LinkedList
, then performance will be significantly worse because LinkedLists
are very slow when it comes to random access.
When you are creating the data, if you are using an ArrayList
, you should initialize the size of its internal array by passing a number into the constructor. Otherwise, initializing the ArrayList
will be significantly slower than initializing an array. This is because, when the ArrayList
's internal array runs out of space, the ArrayList
creates a new, larger array. It then copies all the elements from the old array into the new array. This results in significant performance loss.
int list[][] = new int[12000][2000];
//--or--
List<List<Integer>> list = new ArrayList<List<Integer>>(12000);
for (int i = 0; i < 12000; i++){
list.add(new ArrayList<Integer>(2000));
}
Upvotes: 1
Reputation: 160181
The array will almost certainly be faster.
Using an ArrayList
will bring the performance more in-line since it's backed by an actual array.
Edit to summarize comments
For this usecase I believe the arrays will be measurably faster. Whether it's faster enough to matter is a different issue, and I don't know enough about the actual problem being solved to make a judgement on that.
Upvotes: 6
Reputation: 4182
...of course, the int[][] will use less memory too. If possible, try using byte[][] or short[][] to further reduce memory usage.
Assuming a 32-bit architecture, 12000x2000 equates to 91MB. If bytes are sufficient, then it will be 1/4 the size. Furthermore, there may be performance improvements as well (architecture-dependent).
Upvotes: 1
Reputation: 115328
If list implements RandomAccess
(e.g. ArrayList
) it almost does not cause any performance degradation. If you are using LinkedList
random access to its members can be very expensive.
Lists bring you a very serious benefit: they can grow automatically. And lists are collections that gives you certain benefits in copying from one collection to other (e.g. from map to list etc.)
So you choice should depend on the fact whether you need your list to grow automatically and whether the performance issues are really very important for you. In most cases they are not.
And the last remark. I think that both N-dimensional arrays and list are not the best choice. If you need N dimensions where N>1 create class and store its instances into 1-dimensional array or collection.
Upvotes: 1
Reputation: 27727
There is an extensive discussion on this here:
Array or List in Java. Which is faster?
Here is the benchmark conclusion:
I wrote a little benchmark to compare ArrayLists with Arrays. On my old-ish laptop, the time to traverse through a 5000-element arraylist, 1000 times, was about 10 milliseconds slower than the equivalent array code.
Upvotes: 0
Reputation: 30228
I think two-dimensional array will be faster in most cases, but why don't you test it on your specific problem?
Upvotes: 0