mc29
mc29

Reputation: 85

Multi threaded matrix multiplication performance issue

I am using java for multi threaded multiplication. I am practicing multi threaded programming. Following is the code that I took from another post of stackoverflow.

public class MatMulConcur {

private final static int NUM_OF_THREAD =1 ;
private static Mat matC;

public static Mat matmul(Mat matA, Mat matB) {
matC = new Mat(matA.getNRows(),matB.getNColumns());
return mul(matA,matB);
}

private static Mat mul(Mat matA,Mat matB) {

int numRowForThread;
int numRowA = matA.getNRows();
int startRow = 0;

Worker[] myWorker = new Worker[NUM_OF_THREAD];

for (int j = 0; j < NUM_OF_THREAD; j++) {
    if (j<NUM_OF_THREAD-1){
        numRowForThread = (numRowA / NUM_OF_THREAD);
    } else {
        numRowForThread = (numRowA / NUM_OF_THREAD) + (numRowA % NUM_OF_THREAD);
    }
    myWorker[j] = new Worker(startRow, startRow+numRowForThread,matA,matB);
    myWorker[j].start();
    startRow += numRowForThread;
}

for (Worker worker : myWorker) {
    try {
        worker.join();
    } catch (InterruptedException e) {

    }
  }
  return matC;
 }

private static class Worker extends Thread {

private int startRow, stopRow;
private Mat matA, matB;

public Worker(int startRow, int stopRow, Mat matA, Mat matB) {
    super();
    this.startRow = startRow;
    this.stopRow = stopRow;
    this.matA = matA;
    this.matB = matB;
}

@Override
public void run() {
    for (int i = startRow; i < stopRow; i++) {
        for (int j = 0; j < matB.getNColumns(); j++) {
            double sum = 0;
            for (int k = 0; k < matA.getNColumns(); k++) {
                sum += matA.get(i, k) * matB.get(k, j);
            }
            matC.set(i, j, sum);
        }
    }
  }
}

I ran this program for 1,10,20,...,100 threads but performance is decreasing instead. Following is the time table

  1. Thread 1 takes 18 Milliseconds
  2. Thread 10 takes 18 Milliseconds
  3. Thread 20 takes 35 Milliseconds
  4. Thread 30 takes 38 Milliseconds
  5. Thread 40 takes 43 Milliseconds
  6. Thread 50 takes 48 Milliseconds
  7. Thread 60 takes 57 Milliseconds
  8. Thread 70 takes 66 Milliseconds
  9. Thread 80 takes 74 Milliseconds
  10. Thread 90 takes 87 Milliseconds
  11. Thread 100 takes 98 Milliseconds

Any Idea?

Upvotes: 1

Views: 1301

Answers (2)

Stephen C
Stephen C

Reputation: 718826

People think that using multiple threads will automatically (magically!) make any computation go faster. This is not so1.

There are a number of factors that can make multi-threading speedup less than you expect, or indeed result in a slowdown.

  1. A computer with N cores (or hyperthreads) can do computations at most N times as fast as a computer with 1 core. This means that when you have T threads where T > N, the computational performance will be capped at N. (Beyond that, the threads make progress because of time slicing.)

  2. A computer has a certain amount of memory bandwidth; i.e. it can only perform a certain number of read/write operations per second on main memory. If you have an application where the demand exceeds what the memory subsystem can achieve, it will stall (for a few nanoseconds). If there are many cores executing many threads at the same time, then it is the aggregate demand that matters.

  3. A typical multi-threaded application working on shared variables or data structures will either use volatile or explicit synchronization to do this. Both of these increase the demand on the memory system.

  4. When explicit synchronization is used and two threads want to hold a lock at the same time, one of them will be blocked. This lock contention slows down the computation. Indeed, the computation is likely to be slowed down if there was past contention on the lock.

  5. Thread creation is expensive. Even acquiring an existing thread from a thread pool can be relatively expensive. If the task that you perform with the thread is too small, the setup costs can outweigh the possible speedup.

There is also the issue that you may be running into problems with a poorly written benchmark; e.g. the JVM may not be properly warmed up before taking the timing measurements.


There is insufficient detail in your question to be sure which of the above factors is likely to affect your application's performance. But it is likely to be a combination of 1 2 and 5 ... depending on how many cores are used, how big the CPUs memory caches are, how big the matrix is, and other factors.


1 - Indeed, if this was true then we would not need to buy computers with lots of cores. We could just use more and more threads. Provided you had enough memory, you could do an infinite amount of computation on a single machine. Bitcoin mining would be a doddle. Of course, it isn't true.

Upvotes: 2

Max Vollmer
Max Vollmer

Reputation: 8598

Using multi-threading is not primarily for performance, but for parallelization. There are cases where parallelization can benefit performance, though.

Your computer doesn't have infinite resources. Adding more and more threads will decrease performance. It's like starting more and more applications, you wouldn't expect a program to run faster when you start another program, and you probably wouldn't be surprised if it runs slower.

Up to a certain point performance will remain constant (your computer still has resources to handle the demand), but at some point you reach the maximum your computer can handle and performance will drop. That's exactly what your result shows. Performance stays somewhat constant with 1 or 10 threads, and then drops steadily.

Upvotes: 0

Related Questions