Reputation: 309
Consider the following nested loops that iterate over the elements of each row in the array arr
:
for(int i = 0; i < arr.length; i++)
for(int j; j < arr[0].length; j++)
//some code here involving arr[i][j]
Considering that Java does not have true "2D arrays" but rather only arrays of arrays, would this be as efficient as the following nested loops iterating over the columns of the array?
for(int j = 0; j < arr[0].length; j++)
for(int i = 0; i < arr.length; i++)
//some code involving arr[i][j]
Even though arr[i][j]
is O(1), would I see any difference in the time it takes to do one solution over the other due to Java's implementation of 2D arrays?
Upvotes: 4
Views: 1574
Reputation: 5575
If your array isn't sized in the range of billions rows and columns any performance-related rearranging of the loops should be considered premature optimization.
What is relevant here is readability (most developers in western countries would expect the code to iterate left-to-right and then top-to-bottom).
Additionally using a modern (and much more readable) loop only works that way.
for (int[] row: array) {
for (int value: row) { //or whatever type your values are
// some operations involving value
}
}
Upvotes: -1
Reputation: 4723
An array in Java is a continuous block of memory of elements. For int[]
the type of the elements is int
, thus an array of int
is a continuous block of int
values on memory.
Now the important thing: The type int[]
itself holds a reference to an int
array, or rather a pointer to the first element as you know it from the C language.
2D arrays in C are stored as a continuous block of elements. Each row is stored directly next to each other in memory. A 2D array of integers is in C of type int[][]
. In Java this is not the case. A 2D array in Java is an array of arrays, i.e. a 2D array of integers is in Java an array of int[]
. Generally speaking, a 2D array is an array of individual objects. They are not stored next to each other in memory. As I mentioned, an int[]
itself holds a reference to an integer array, thus int[][]
is an array of references where each of those reference an int[]
.
Now to the question: The biggest factor to performance when comparing the ways to iterate over a 2D array is caching. The CPU utilizes caches to increase performance. This is because caches that are technically closer to the CPU provide a faster access on the values than an access on the main memory. That means, what you want to achieve is that an int[]
is accessed element by element. Then the next int[]
is accessed element by element. What you don't want is that the first element of each int[]
is accessed, then the second of each and so on. This is because you want to cache an int[]
, access all values, and then cache the next int[]
.
Your first example does exactly that: The elements of the first row, or rather array, are accessed. Then the elements of the second one. The second example does a different thing. Each first element is accessed. Then each second one and so one.
The problem is that a cached array A
could be dropped from the cache because it is not accessed in a longer time. It would make place for an array B
that will be accessed for the next n'th element. But remember, you will need A
again for the n+1th element, and then B
again. You want an array to stay in cache as long as you need it, so you want to cache an array, immediately use all the values and then replace the cached array with the next one. Otherwise you have overhead in writing to the cache and moving values between cache and memory for no reason.
Of cource, all this is very technical and highly dependent on the compiler and the machine your code is running on but I want to give you an idea about what approach is more performant in theory.
Upvotes: 1
Reputation: 3131
This is my first benchmark ever, might be incorrect!
The research about difference in other languages (especially C/C++) made me curious so I've decided to try writing a benchmark (and learn how to do them at the same time).
Result summary:
Benchmark (n) Mode Cnt Score Error Units
MainBenchmark.columnFirst 10000 avgt 5 1921,752 ± 341,941 ms/op
MainBenchmark.rowFirst 10000 avgt 5 381,053 ± 44,640 ms/op
It seems that row-major order (I think that's the correct name) is about 5 times faster than column-major.
In day to day programming this won't really matter, you will almost never need to optimize something like this. It was done just for science.
Here is the benchmark I've wrote, it creates int[10000][10000] array and tries to iterate over all the elements:
@State(Scope.Benchmark)
@Fork(value = 1, warmups = 2)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public class MainBenchmark {
@Param({"10000"})
private int n;
private int[][] testData;
public static void main(String[] args) throws Exception {
Options opt = new OptionsBuilder()
.include(MainBenchmark.class.getSimpleName())
.forks(1)
.build();
new Runner(opt).run();
}
@Setup
public void setup() {
testData = createData();
}
@Benchmark
public void rowFirst(Blackhole bh) {
for (int i = 0; i < testData.length; i++) {
for (int j = 0; j < testData[0].length; j++) {
int tmp = testData[i][j];
bh.consume(tmp);
}
}
}
@Benchmark
public void columnFirst(Blackhole bh) {
for (int j = 0; j < testData[0].length; j++) {
for (int i = 0; i < testData.length; i++) {
int tmp = testData[i][j];
bh.consume(tmp);
}
}
}
private int[][] createData() {
int[][] ints = new int[n][n];
for (int[] anInt : ints) {
Arrays.fill(anInt, 0);
}
return ints;
}
}
Here are the full results:
# JMH version: 1.23
# VM version: JDK 13.0.1, OpenJDK 64-Bit Server VM, 13.0.1+9
# VM invoker: C:\Program Files\Java\jdk-13.0.1\bin\java.exe
# VM options: -Dvisualvm.id=957546995472200 -javaagent:(...) -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: benchmarking.MainBenchmark.columnFirst
# Parameters: (n = 10000)
# Run progress: 0,00% complete, ETA 00:10:00
# Warmup Fork: 1 of 2
# Warmup Iteration 1: 1810,536 ms/op
# Warmup Iteration 2: 1883,026 ms/op
# Warmup Iteration 3: 1798,335 ms/op
# Warmup Iteration 4: 1806,877 ms/op
# Warmup Iteration 5: 1797,246 ms/op
Iteration 1: 1794,506 ms/op
Iteration 2: 1822,085 ms/op
Iteration 3: 1845,853 ms/op
Iteration 4: 2000,127 ms/op
Iteration 5: 2045,922 ms/op
# Run progress: 16,67% complete, ETA 00:09:15
# Warmup Fork: 2 of 2
# Warmup Iteration 1: 1780,858 ms/op
# Warmup Iteration 2: 1771,650 ms/op
# Warmup Iteration 3: 1786,517 ms/op
# Warmup Iteration 4: 2198,348 ms/op
# Warmup Iteration 5: 1742,218 ms/op
Iteration 1: 2124,944 ms/op
Iteration 2: 2187,857 ms/op
Iteration 3: 1905,843 ms/op
Iteration 4: 1925,476 ms/op
Iteration 5: 1785,446 ms/op
# Run progress: 33,33% complete, ETA 00:07:22
# Fork: 1 of 1
# Warmup Iteration 1: 2082,695 ms/op
# Warmup Iteration 2: 1783,062 ms/op
# Warmup Iteration 3: 1799,518 ms/op
# Warmup Iteration 4: 1800,832 ms/op
# Warmup Iteration 5: 1974,720 ms/op
Iteration 1: 1934,673 ms/op
Iteration 2: 2013,677 ms/op
Iteration 3: 1784,654 ms/op
Iteration 4: 1895,396 ms/op
Iteration 5: 1980,359 ms/op
Result "benchmarking.MainBenchmark.columnFirst":
1921,752 ±(99.9%) 341,941 ms/op [Average]
(min, avg, max) = (1784,654, 1921,752, 2013,677), stdev = 88,801
CI (99.9%): [1579,811, 2263,693] (assumes normal distribution)
# JMH version: 1.23
# VM version: JDK 13.0.1, OpenJDK 64-Bit Server VM, 13.0.1+9
# VM invoker: C:\Program Files\Java\jdk-13.0.1\bin\java.exe
# VM options: -Dvisualvm.id=957546995472200 -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2019.2.4\lib\idea_rt.jar=51059:C:\Program Files\JetBrains\IntelliJ IDEA 2019.2.4\bin -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: benchmarking.MainBenchmark.rowFirst
# Parameters: (n = 10000)
# Run progress: 50,00% complete, ETA 00:05:32
# Warmup Fork: 1 of 2
# Warmup Iteration 1: 381,809 ms/op
# Warmup Iteration 2: 394,792 ms/op
# Warmup Iteration 3: 384,524 ms/op
# Warmup Iteration 4: 389,858 ms/op
# Warmup Iteration 5: 378,686 ms/op
Iteration 1: 373,117 ms/op
Iteration 2: 371,832 ms/op
Iteration 3: 373,667 ms/op
Iteration 4: 384,930 ms/op
Iteration 5: 377,080 ms/op
# Run progress: 66,67% complete, ETA 00:03:37
# Warmup Fork: 2 of 2
# Warmup Iteration 1: 381,334 ms/op
# Warmup Iteration 2: 383,445 ms/op
# Warmup Iteration 3: 387,772 ms/op
# Warmup Iteration 4: 410,992 ms/op
# Warmup Iteration 5: 374,811 ms/op
Iteration 1: 383,491 ms/op
Iteration 2: 389,619 ms/op
Iteration 3: 388,545 ms/op
Iteration 4: 369,743 ms/op
Iteration 5: 372,389 ms/op
# Run progress: 83,33% complete, ETA 00:01:47
# Fork: 1 of 1
# Warmup Iteration 1: 368,873 ms/op
# Warmup Iteration 2: 383,034 ms/op
# Warmup Iteration 3: 394,592 ms/op
# Warmup Iteration 4: 373,512 ms/op
# Warmup Iteration 5: 373,946 ms/op
Iteration 1: 394,551 ms/op
Iteration 2: 392,298 ms/op
Iteration 3: 376,282 ms/op
Iteration 4: 372,903 ms/op
Iteration 5: 369,230 ms/op
Result "benchmarking.MainBenchmark.rowFirst":
381,053 ±(99.9%) 44,640 ms/op [Average]
(min, avg, max) = (369,230, 381,053, 394,551), stdev = 11,593
CI (99.9%): [336,412, 425,693] (assumes normal distribution)
# Run complete. Total time: 00:10:42
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark (n) Mode Cnt Score Error Units
MainBenchmark.columnFirst 10000 avgt 5 1921,752 ± 341,941 ms/op
MainBenchmark.rowFirst 10000 avgt 5 381,053 ± 44,640 ms/op
Process finished with exit code 0
Upvotes: 2