user3267256
user3267256

Reputation: 113

When will 1 thread run faster than multiple threads run concurrently

For an assignment I must implement a Java program that when run with multiple threads is actually slower than with just 1. I understand that creating a thread takes some overhead but in this example I am dealing with a large array 20,000x20,000. There are no dependencies so the benefits of creating 4 smaller chunks and running them concurrently should always out way the cost of creating a thread right?

for (int i = 0; i < numThreads; i++) {

// for each iteration of the body's for loop,
// calculate the starting and ending indexes
// of the ith chunk of the y dimension of
// the anArray array
final int indexStart = i * chunkSize;
final int indexEnd = (i + 1) * chunkSize;

// each "execute" method of the Executor class
// creates a new object of type Runnable ...
// be careful with the parentheses here ... the
// entire next code block is being passes
// as a parameter to the execute method
ex.execute(new Runnable() {

    // The run() method is declared abstract in the Runnable
    // class, so it MUST be overriden. The body of the
    // run method is the "code" that each thread executes
    @Override
    public void run() {

        for (int j=0; j<anArray.length; j++){
        for (int k = indexStart; k < indexEnd; k++){
                            anArray[j][k] = anArray[j][k] * anArray[j][k] / 3.45 * Math.sqrt(anArray[j][k]);

                        }

        } // end for

    } // end run
    }
    );
}

our task is to modify only the innermost for loop we can do what ever we want in there but it has to make the runtime slower when executed with more threads. with an upper limit of 8 threads. My real question is what can cause more over head when implementing multiple threads. I have done some research and found that you can use most of the cpu with a single thread so creating more wont help, how is this possible.

Upvotes: 2

Views: 9966

Answers (2)

Stephen C
Stephen C

Reputation: 719679

When will 1 thread run faster than multiple threads run concurrently.

Here are some:

  1. If you are creating your own threads: when the time taken to create and start N threads and run M tasks on each is greater than the time to run N * M tasks on a single thread. Hint: starting a Java thread is expensive.

  2. If you are using an Executor when the time taken to schedule a task is large taken to perform the task.

  3. When you have too many threads relative to the number of cores. Hint: the speedup you can get by multi-threading (for compute-bound tasks) is limited by the number of cores, not the number of threads.

  4. When the tasks have inherent concurrency bottlenecks, such as accessing / updating a common synchronized data structure.

  5. When the tasks involve accessing a large number of memory cells across multiple threads and you get lots of memory cache misses and memory contention.

  6. When you have made a mistake in your benchmarking; e.g. when you don't take proper account of "JVM warmup" effects.


In this case, I think multi threading is better than single thread as we have more cores. 100 Thread 1 Core VS 100 Threads 4 cores.

If you have 4 cores, then running 100 threads in this example won't give you any more speedup than 4 threads. Now add the fact that you have the overheads of starting 96 threads that are not helping ... and that could explain why multiple (100) threads is slower than a single thread.

Also, your example appears likely to generate a lot of cache misses and memory traffic. It looks you will be reading and writing every cell in an array of arrays with 400 million elements. Memory contention will be a bottleneck.

Upvotes: 2

Peter Lawrey
Peter Lawrey

Reputation: 533880

I have done some research and found that you can use most of the cpu with a single thread so creating more wont help, how is this possible.

Multiple threads work best when they run independently. This means that any over utilisation of a shared resource will limit or even make using multiple threads slower.

Say you have a Socket with 4 cores. This means you have 4 cores with 32 KB of L1 cache each. If you use more than this amount of memory, they have to use the L2 cache which is around 3-4x slower. But these are only 256 KB of memory. If you use more than this they have to use the one and only one L3 cache. i.e. use more than 1 MB of memory and your application no longer scales and could be slower.

In your case, you are also using the Floating Point unit esp Math.sqrt which takes quite a bit of CPU. There is only one FPU per core so you are likely to find that using hyper threading won't help much. (perhaps <10%)

In short, given your floating point operation is pretty expensive, I would expect you should get good scalability up to the number of cores you have. As you get more cores, at some point your L3 cache will become the bottleneck. e.g. for 18 cores, you might find this a problem.

Upvotes: 4

Related Questions