cybye
cybye

Reputation: 1171

Java VM memory performance - Are Array writes faster than Array reads?

I performed a short benchmark on a long array in java with quite strange results. It seems that sequential reads with random writes are faster - half the time - than random reads with sequential writes. Has anyone a clue why??

Here are two methods that write an array of some longs (run with -Xmx2G or so) by random when reading sequentially and read sequentially when writing by random:

import java.util.Random;


public class Scratch {
static Random random = new Random();
static long[] arr = new long[100000000];

static void seqReadRandWrite() {
    for(int i=0;i<arr.length;i++) {
        int at = random.nextInt(arr.length);
        arr[at] = arr[i];
    }
}

static void seqWriteRandRead() {
    for(int i=0;i<arr.length;i++) {
        int at = random.nextInt(arr.length);
        arr[i] = arr[at];
    }
}

public static void main(String[] args) throws Exception {

    seqWriteRandRead(); // warm up

    long nanos = System.nanoTime();
    seqReadRandWrite();
    System.out.println("Time: " + (System.nanoTime()-nanos) + "ns");

    nanos = System.nanoTime();
    seqWriteRandRead();
    System.out.println("Time: " + (System.nanoTime()-nanos) + "ns");

}
}

the results on my notebook are

Time: 2774662168ns

Time: 6059499068ns

Which means its twice as fast to write by random compared to read .. or? Is my notebook broken?

ps.: this does not claim to be a benchmark, although most of the points in the linked advices about benchmarking are covered. Even if I run the already 200,000,000 operations multiple times, the resuts stay quite constant. It seems (seems!) that moving memory from random positions into sequential blocks is slower than moving memory from sequential positions into random blocks at least with memory of this size and the above way of doing it. and I wonder why?

Upvotes: 12

Views: 503

Answers (6)

Nitsan Wakart
Nitsan Wakart

Reputation: 2909

The answer is in previous comments and boils down to memory access patterns effects. This blog post covers the effects of random reads. Writes do not suffer similarly.

This is not a Java issue (or indeed any language issue) but a reality of the hardware you run on (and a common reality at that). This doesn't mean you should ignore it! While your initial benchmark may have been flawed it still hit a real issue for some software, so at that it's a valuable lesson.

The conclusion is not that reads are more expensive than writes though. It's that random memory access is not well catered for by the hardware. This is basically why LinkedList performance is so much worse than ArrayList for sequential access, they are both of same computational complexity but array access plays to the hardware strength where linked list doesn't.

Upvotes: 1

Miserable Variable
Miserable Variable

Reputation: 28761

In summary, the question title is slightly incorrect. Truth seems to be that on some environments (e.g mine and OP's) random array writes are faster then random array reads. But note that this is not true for some other people.

Based on @JustinKSU's comment I separated out read and write and found that random writes are faster then random reads. The results are as follows. This seems to be the reason, and the collective opinion here seems to be that the read-misses on cache are more expensive then write-misses (if there is any caching involved in writes at all).

In production though where there is other activity, hotspot might play a role.

/cygdrive/c/Java/jdk1.7.0/bin/javac.exe Scratch.java && /cygdrive/c/Java/jdk1.7.0/bin/java Scratch
Starting
seqRead: 1273719725ns
seqRead: 1243055271ns
seqRead: 1245022497ns
seqRead: 1242868527ns
seqRead: 1241655611ns
randRead: 6900959912ns
randRead: 6965196004ns
randRead: 7379623094ns
randRead: 7020390995ns
randRead: 6938997617ns
seqWrite: 1266963940ns
seqWrite: 1250599487ns
seqWrite: 1246471685ns
seqWrite: 1230472648ns
seqWrite: 1246975416ns
randWrite: 3898382192ns
randWrite: 3897441137ns
randWrite: 3939947844ns
randWrite: 4207906037ns
randWrite: 4103594207ns

Compilation finished at Thu Jan 31 14:38:57

My modified code is as follows:

import java.util.Random;


public class Scratch {
static Random random = new Random();
static long[] arr = new long[100000000];

static void seqReadRandWrite() {
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        arr[at] = arr[i];
    }
}

static void seqWriteRandRead() {
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        arr[i] = arr[at];
    }
}


static void seqRead() {
    int x = 0;
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        x += arr[i];
    }
}

static void randRead() {
    int x = 0;
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        x += arr[at];
    }
}

static void seqWrite() {
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        arr[i] = at;
    }
}

static void randWrite() {
    for(int i=0;i<arr.length;i++) {
        int at = Math.abs(random.nextInt() % arr.length);
        arr[at] = at;
    }
}


public static void main(String[] args) throws Exception {

    // seqWriteRandRead(); // warm up
    System.out.println("Starting");

    long nanos =  -1;
    /*
    for (int i = 0; i < 5; i++) {       
        nanos = System.nanoTime();
        seqWriteRandRead();
        System.out.println("WriteRandRead Time: " + (System.nanoTime()-nanos) + "ns");

        nanos = System.nanoTime();
        seqReadRandWrite();
        System.out.println("ReadRandWrite Time: " + (System.nanoTime()-nanos) + "ns");
    }
    */

    for (int i = 0; i < 5; i++) {       
        nanos = System.nanoTime();
        seqRead();
        System.out.println("seqRead: " + (System.nanoTime()-nanos) + "ns");
    }

    for (int i = 0; i < 5; i++) {       
        nanos = System.nanoTime();
        randRead();
        System.out.println("randRead: " + (System.nanoTime()-nanos) + "ns");
    }


    for (int i = 0; i < 5; i++) {       
        nanos = System.nanoTime();
        seqWrite();
        System.out.println("seqWrite: " + (System.nanoTime()-nanos) + "ns");
    }

    for (int i = 0; i < 5; i++) {       
        nanos = System.nanoTime();
        randWrite();
        System.out.println("randWrite: " + (System.nanoTime()-nanos) + "ns");
    }

}
}

UPDATE

@tomcarchrae did the same test on Linux, with significantly different results. Below, the first column is numbers from my test and the second are from Tom's:

seqRead:   1273719725ns   2810487542ns  
seqRead:   1243055271ns   2780504580ns  
seqRead:   1245022497ns   2746663894ns  
seqRead:   1242868527ns   2746094469ns  
seqRead:   1241655611ns   2763107970ns  
randRead:  6900959912ns   23093543703ns 
randRead:  6965196004ns   22458781637ns 
randRead:  7379623094ns   24421031646ns 
randRead:  7020390995ns   25880250599ns 
randRead:  6938997617ns   26873823898ns 
seqWrite:  1266963940ns   4226886722ns  
seqWrite:  1250599487ns   4537680602ns  
seqWrite:  1246471685ns   3880372295ns  
seqWrite:  1230472648ns   4160499114ns  
seqWrite:  1246975416ns   4008607447ns  
randWrite: 3898382192ns   25985349107ns 
randWrite: 3897441137ns   22259835568ns 
randWrite: 3939947844ns   22556465742ns 
randWrite: 4207906037ns   22143959163ns 
randWrite: 4103594207ns   21737397817ns 

Upvotes: 1

Tom Carchrae
Tom Carchrae

Reputation: 6476

Your experiment is broken, not your laptop. See here for a discussion and some tools to help measure performance: Java performance timing library

Below are some results that contract yours. Also I modified your code to be more rigorous and careful in how it takes measurements.


My environment is Linux (Mint 14 which is based on Ubuntu 12.10) using the Sun JDK 1.6.0_38

With 1.5G of heap for the big example, ie -Xmx1512


Note: interesting. Could be my result is different because the array size is different below. Will re-run and update.

Nope: the result is similar in there is not much difference in the mean. But what is interesting is the difference against the short run, ie 21092.5 (/10 = 2109.2) vs 1645.2 which may be slower because of memory paging.

result with static long[] arr = new long[100000000]; (the original array size in question)

Write: DescriptiveStatistics: n: 10 min: 20893.0 max: 22190.0 mean: 21092.5 std dev: 390.90727800848117 median: 20953.5 skewness: 3.0092198852491543 kurtosis: 9.264808973899097

Read: DescriptiveStatistics: n: 10 min: 21668.0 max: 22736.0 mean: 21892.5 std dev: 318.31509546359877 median: 21766.5 skewness: 2.5034216544466124 kurtosis: 6.560838306717343


I'm not seeing a huge difference in reads vs writes. I changed the experiment to measure 10 times on a slightly smaller array (result is the same number of reads/writes). Feel free to re-run with a larger size array or sample size.

Write: DescriptiveStatistics: n: 10 min: 1584.0 max: 1799.0 mean: 1645.2 std dev: 59.51619760853156 median: 1634.5 skewness: 2.137918517160786 kurtosis: 5.764166551997385

Read: DescriptiveStatistics: n: 10 min: 1568.0 max: 2202.0 mean: 1689.0 std dev: 186.93908693000031 median: 1623.0 skewness: 2.770215113912315 kurtosis: 8.12245132320571

Here is a modified version of your code that does more samples:

import java.util.Random;

import org.apache.commons.lang.time.StopWatch;
import org.apache.commons.math.stat.descriptive.DescriptiveStatistics;

public class Test {
    static Random random = new Random();
//  static long[] arr = new long[100000000];
    static long[] arr = new long[10000000];

    static void seqReadRandWrite() {
        for (int i = 0; i < arr.length; i++) {
            int at = Math.abs(random.nextInt()) % arr.length;
            arr[at] = arr[i];
        }
    }

    static void seqWriteRandRead() {
        for (int i = 0; i < arr.length; i++) {
            int at = Math.abs(random.nextInt()) % arr.length;
            arr[i] = arr[at];
        }
    }

    public static void main(String[] args) throws Exception {

        StopWatch timer = new StopWatch();
        int count = 10;

        // warm up
        for (int i=0; i<3; i++){
            seqReadRandWrite();
        }
        DescriptiveStatistics write = new DescriptiveStatistics();
        for (int i=0; i<count; i++){
            timer.reset();
            timer.start();
            seqReadRandWrite();
            timer.stop();
            write.addValue(timer.getTime());
        }
        System.out.println("Write: " + write);

        // warm up
        for (int i=0; i<3; i++){
            seqWriteRandRead(); 
        }
        DescriptiveStatistics read = new DescriptiveStatistics();
        for (int i=0; i<count; i++){
            timer.reset();
            timer.start();
            seqWriteRandRead();
            timer.stop();
            read.addValue(timer.getTime());
        }

        System.out.println("Read: " + read);


    }
}

Upvotes: 0

Stephen C
Stephen C

Reputation: 719229

Your benchmark is producing numbers that fail the "Do they make sense?" test. In such a situation, you should always double / triple / quadruple check your methodology ... BEFORE treat the numbers as a true reflection of reality.

Writing reliable benchmarks is hard. And in the case of Java, it is particularly hard, because some aspects of the Java platform can introduce systematic distortions into your benchmark measurements ... unless you specifically allow / compensate for them.

But the "check your methodology" rule applies to ALL experiments ... especially ones that produce results that don't seem to make sense. (Like neutrinos travelling faster than light ...)


The other thing to note is that once you've rewritten the benchmark to account for the confounding factors, you may still see unexpected numbers. The issue here is that performance of benchmarks like this are likely to be sensitive to things like the size of L1 and L2 caches, size of cache lines, relative speeds of the different levels of memory ... and their interplay with the exact sequences of instructions that the benchmark produce in the tight loops.

These things are complicated, difficult to analyse, and can give counterintuitive behaviour. And it is not surprising (to me) that different machines give different measured performance.

So even if the figures are real, it is still unsafe to draw any general conclusions about the speed of reads versus writes from this benchmark. Not even if you restrict them to just your laptop.

Upvotes: 2

Semo
Semo

Reputation: 821

I believe this benchmark is totally useless for you. There are a lot of parameters of measurements to consider you did not describe and the way you approach that problem, is completely undescribed. To make any conclusion at all about speed of implementations regarding VMs, Computers, RAM speed, software you process at the same time, kind of Objects or simple stuff you copy, and so on, you must learn about a methodic way. This question is not answereable. You must narrow on which specific circumstances you want to know about speed.

Especially you cannot make any conclusion, when using Random numbers. This increases a lot the problem of best, worst or average case Complexity.

Please check out complexity on algorithms, then go on with searching how to make scientific runtime performance measurements. I hope I could help you a bit.

This first answer is awesome and will help you to understand. How do I write a correct micro-benchmark in Java?

Best regards,

Upvotes: 0

irreputable
irreputable

Reputation: 45443

results on my PC: (ns per r/w)

seq read :     1.4 
rnd read :   10x.x   
seq write:     3.3 
rnd write:   10x.x

and seqReadRandWrite and seqWriteRandRead are equally fast at 100ns per loop.

so this may depend on hardware. also VM setting. try java -server and see if speed improves.

Upvotes: 0

Related Questions