nbarraille
nbarraille

Reputation: 10023

Why is my multi threading not efficient?

I've designed a class that fills an array with integers using a various number of threads, in order to see the power of multi threading. But according to my result, there is none...

The idea: The idea was too fill an array of 100000000 integers with the value "1". Starting with 1 thread (one threads fills the whole array) and incrementing it until 100 threads (each thread fills a sub array of size 100000000/nbThreads)

Example: With 10 threads, I create 10 threads and each is filling an array of 10000000 integers.

Here is my code:

public class ThreadedArrayFilling extends Thread{
    private int start;
    private int partitionSize;
    public static int[] data;
    public static final int SIZE = 100000000;
    public static final int NB_THREADS_MAX = 100;


    public static void main(String[] args){
        data = new int[SIZE];
        long startTime, endTime;
        int partition, startIndex, j;
        ThreadedArrayLookup[] threads;

        for(int i = 1; i <= NB_THREADS_MAX; i++){       
            startTime = System.currentTimeMillis();
            partition = SIZE / i;
            startIndex = 0;
                threads = new ThreadedArrayLookup[i];
            for(j = 0; j < i; j++){         
                threads[j] = new ThreadedArrayLookup(startIndex, partition);
                startIndex += partition;
            }
            for(j = 0; j < i; j++){
                try {
                    threads[j].join();
                } catch (InterruptedException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }
            endTime = System.currentTimeMillis();       
            System.out.println(i + " THREADS: " + (endTime - startTime) + "ms");
        }
    }

    public ThreadedArrayFilling(int start, int size){
        this.start = start;
        this.partitionSize = size;
        this.start();
    }

    public void run(){
        for(int i = 0; i < this.partitionSize; i++){
            data[this.start + i] = 1;
        }
    }

    public static String display(int[] d){
        String s = "[";

        for(int i = 0; i < d.length; i++){
            s += d[i] + ", ";
        }

        s += "]";
        return s;
    }

}

And here are my results:

1 THREADS: 196ms
2 THREADS: 208ms
3 THREADS: 222ms
4 THREADS: 213ms
5 THREADS: 198ms
6 THREADS: 198ms
7 THREADS: 198ms
8 THREADS: 198ms
9 THREADS: 198ms
10 THREADS: 206ms
11 THREADS: 201ms
12 THREADS: 197ms
13 THREADS: 198ms
14 THREADS: 204ms
15 THREADS: 199ms
16 THREADS: 203ms
17 THREADS: 234ms
18 THREADS: 225ms
19 THREADS: 235ms
20 THREADS: 235ms
21 THREADS: 234ms
22 THREADS: 221ms
23 THREADS: 211ms
24 THREADS: 203ms
25 THREADS: 206ms
26 THREADS: 200ms
27 THREADS: 202ms
28 THREADS: 204ms
29 THREADS: 202ms
30 THREADS: 200ms
31 THREADS: 206ms
32 THREADS: 200ms
33 THREADS: 205ms
34 THREADS: 203ms
35 THREADS: 200ms
36 THREADS: 206ms
37 THREADS: 200ms
38 THREADS: 204ms
39 THREADS: 205ms
40 THREADS: 201ms
41 THREADS: 206ms
42 THREADS: 200ms
43 THREADS: 204ms
44 THREADS: 204ms
45 THREADS: 206ms
46 THREADS: 203ms
47 THREADS: 204ms
48 THREADS: 204ms
49 THREADS: 201ms
50 THREADS: 205ms
51 THREADS: 204ms
52 THREADS: 207ms
53 THREADS: 202ms
54 THREADS: 207ms
55 THREADS: 207ms
56 THREADS: 203ms
57 THREADS: 203ms
58 THREADS: 201ms
59 THREADS: 206ms
60 THREADS: 206ms
61 THREADS: 204ms
62 THREADS: 201ms
63 THREADS: 206ms
64 THREADS: 202ms
65 THREADS: 206ms
66 THREADS: 205ms
67 THREADS: 207ms
68 THREADS: 210ms
69 THREADS: 207ms
70 THREADS: 203ms
71 THREADS: 207ms
72 THREADS: 205ms
73 THREADS: 203ms
74 THREADS: 211ms
75 THREADS: 202ms
76 THREADS: 207ms
77 THREADS: 204ms
78 THREADS: 212ms
79 THREADS: 203ms
80 THREADS: 210ms
81 THREADS: 206ms
82 THREADS: 205ms
83 THREADS: 203ms
84 THREADS: 203ms
85 THREADS: 209ms
86 THREADS: 204ms
87 THREADS: 206ms
88 THREADS: 208ms
89 THREADS: 263ms
90 THREADS: 216ms
91 THREADS: 230ms
92 THREADS: 216ms
93 THREADS: 230ms
94 THREADS: 234ms
95 THREADS: 234ms
96 THREADS: 217ms
97 THREADS: 229ms
98 THREADS: 228ms
99 THREADS: 215ms
100 THREADS: 232ms

What did I miss?

EDIT: Additional infos:

My machine is running a dual core.

Expectations:

But this verifies none of my expectations. Are my expectations false, or is this a problem with my algo?

Upvotes: 11

Views: 4452

Answers (4)

Olof Forshell
Olof Forshell

Reputation: 3284

It is possible for two threads - each with its own cpu or core - working in unison, to complete a task slower than if just one thread did all the work. Both cores want their L1+L2 caches to write data to memory which is fine. However they soon saturate the common L3 cache in such a way that it stops additional writes until it has managed to write an updated cache line to RAM, thereby freeing it to accept new writes.

To put it another way the purpose of your threads is not to perform any processing to speak of but to fill system RAM. System RAM is slow and as you can see by comparing your one-thread result with that for two threads the write-to-RAM capacity is all used up with one thread and therefore cannot be faster with two threads.

Your threads are so small that in all probability they will reside in the L1 cache and therefore not require fetches from system RAM which would hamper your capacity to do RAM writes. Your ability to write to RAM is the same whether you have 1 or 100 threads trying to do it. The more threads you have though, the more thread administration overhead you will have. This is negligible for few threads but increases for every additional thread and will eventually become noticeable.

Upvotes: 0

Michael Borgwardt
Michael Borgwardt

Reputation: 346536

With two cores, the best performance you could possibly expect is 2 threads taking half the time as one thread. Any additional threads are only creating useless overhead after that - assuming that you're completely CPU-bound, but you are actually not.

The question is why you're not seeing an improvement when going from 1 to 2 threads. And the reason is probably that your program is not CPU-bound, but memory-bound. Your bottleneck is main memory access, and the 2 threads are just taking turns writing to main memory. The actual CPU cores are doing nothing most of the time. You'll see the expected difference if instead of doing little actual work on a large area of memory you do a lot of CPU-intensive work on a small amount of memory. Because then each CPU core can work completel inside its cache.

Upvotes: 21

Jens Schauder
Jens Schauder

Reputation: 81998

The task you perform inside the thread is so tiny, the time used for that is outweighted by the overhead of your setup.

Do some heavy calculation (e.g. run an approximation of PI to put in the array) the you will see a benefit of multiple threads but only up to approximatly the number of cores your machine has.

Or do something that waits for something external (reading from a database, scratching data from a website) this might be more performant as long as other threads do something usefull while others are waiting.

Upvotes: 4

Gugussee
Gugussee

Reputation: 1723

Multithreading is super efficient when your software is CPU-bound: there are a lot of applications which are mono-threaded and you can see them painfully underusing modern CPUs by maxxing only one core's usage (this appears very clearly in CPU monitors).

However there's no point in launching many more threads than the number of (virtual) CPUs available.

Correctly multi-threaded applications that do, for example, number crunching, do create a number of worker threads that is related to the number of (virtual) CPUs available to the JVM.

Upvotes: 9

Related Questions