Why my threads perform the same amount of work in different times?

Question

I have a simple multithreaded application in Java which looks like this:

class MyThreads extends Thread{
    public void run() {
        {
            // some thread initializations

            // every thread reads 2 files (its own files, 
            // so node 0 will read A0.txt and B0.txt 
            // and node 1 will read A1.txt and B1.txt)
            // he files have sizes between 10-20MB. 
            // A's files contain different information for different nodes (A0.txt != A1.txt),
            // but B's files are the same(B0.txt has 
            // the same info as B1.txt). This is just a scenario.

            // it stores the data that was 
            // read before in the memory.
            // Again, i know B can be shared since 
            // it has the same info in both threads, but it's not.
        }

        {
            // simple computation on the data retrieved 
            // (addition, multiplication, etc)
            // I assume there is no need to synchronize 
            // the threads since they apply operations on their own data.
            // Here, every thread executes the same number of operations
        }

        {
            // writing the results on different files. This phase in unimportant.
        }
    }

    public static void main(String args[]) {
        // start 4 threads
    }
}

When testing the performance for the initialization part, computation part I got these strange results:

2016-03-11-NodeThread:1 time[2318] tag[initialization]
2016-03-11-NodeThread:0 time[2379] tag[initialization]
2016-03-11-NodeThread:2 time[2474] tag[initialization]
2016-03-11-NodeThread:3 time[2481] tag[initialization]
2016-03-11-NodeThread:2 time[30ms] tag[computation]
2016-03-11-NodeThread:1 time[6ms] tag[computation]
2016-03-11-NodeThread:3 time[7ms] tag[computation]
2016-03-11-NodeThread:0 time[6ms] tag[computation]

As one can see the computation for NodeThread:2 took 30ms but for the other nodes took less than 10 ms.

Though, after inserting a barrier between the initialization and the computation I get good results:

2016-03-11-NodeThread:1 time[2318] tag[initialization]
2016-03-11-NodeThread:0 time[2379] tag[initialization]
2016-03-11-NodeThread:2 time[2474] tag[initialization]
2016-03-11-NodeThread:3 time[2481] tag[initialization]
2016-03-11-NodeThread:2 time[30ms] tag[computation]
2016-03-11-NodeThread:1 time[33ms] tag[computation]
2016-03-11-NodeThread:3 time[29ms] tag[computation]
2016-03-11-NodeThread:0 time[31ms] tag[computation]

My question is: if the threads don't communicate at all, they read from different parts of the disk, and they perform the same amount of computation, why the need of synchronising them before computing? My guess would be that a caching is involved, but I can't explain why.

NB. The machine where I tested the code has more that 4 cores, no other cpu consuming processes were running. For measuring the time I used perf4j like this.

    class MyThreads extends Thread{
        public void run() {
            {
                StopWatch stopWatch = new Log4JStopWatch();
                // some thread initializations

                // every thread reads 2 files (its own files,
                // so node 0 will read A0.txt and B0.txt
                // and node 1 will read A1.txt and B1.txt)
                // he files have sizes between 10-20MB.
                // A's files contain different information for different nodes (A0.txt != A1.txt),
                // but B's files are the same(B0.txt has
                // the same info as B1.txt). This is just a scenario.

                // it stores the data that was
                // read before in the memory.
                // Again, i know B can be shared since
                // it has the same info in both threads, but it's not.
                stopWatch.stop("initialization");
// barrier
            }

            {
                StopWatch stopWatch = new Log4JStopWatch();
                // simple computation on the data retrieved
                // (addition, multiplication, etc)
                // I assume there is no need to synchronize
                // the threads since they apply operations on their own data.
                // Here, every thread executes the same number of operations
                stopWatch.stop("computation");
            }

            {
                // writing the results on different files. This phase in unimportant.
            }
        }

        public static void main(String args[]) {
            // start 4 threads
        }
    }

Why my threads perform the same amount of work in different times?

Answers (1)

Related Questions