Clint Hepworth
Clint Hepworth

Reputation: 23

Asked to multithread matrix multiplication in c++11

As of now i have two threads for each one of my functions. Axe and Sword are Matrix objects.

thread thrd1(Add, std::ref(Axe), std::ref(Sword), std::ref(Axe));
thread thrd2(Multiply, std::ref(Axe), std::ref(Sword), std::ref(Axe));

Im am new to threading, and don't quite understand it. Do i have to add threading into my multiply function? right now it is simply

//Multiply the matrices
void Multiply(Matrix &a, Matrix &b, Matrix &c){
    for (auto i=0; i<c.dx; ++i) {
        for (auto j=0; j<c.dy; ++j) {
            for (auto k=0; k<a.dy; ++k) {
                c.p[i][j] += a.p[i][k] * b.p[k][j];
            }
        }
    }
}

but i feel as if i need to add something else, due to their being no decrease in time while setting the number of threads through openMP. Can anyone help me out?

Upvotes: 0

Views: 2930

Answers (2)

Z boson
Z boson

Reputation: 33659

All you have to do is this

void Multiply(Matrix &a, Matrix &b, Matrix &c) {
    #pragma omp parallel for
    for (int i=0; i<c.dx; ++i) {
        for (int j=0; j<c.dy; ++j) {
            for (int k=0; k<a.dy; ++k) {
                c.p[i][j] += a.p[i][k] * b.p[k][j];
            }
        }
    }
}

You probably don't want to worry about the number of threads. Just let OpenMP choose the default. That will be set to the number of logical cores. However, if you have hyper-threading it may help to set the number of thread to the number of physical cores NOT the number of logical cores.

You also might want to try fusing the loop. Like this

#pragma omp parallel for
for(int n=0; n<c.dx*c.dy; n++) {
    int i=n/c.dy;
    int j=n%c.dy;

However, when you read b.p[k][j] it's likely going to have many cache misses. A much better solution is to take the transpose of b and access the transpose as b.p[j][k].

An even better solution is to use tiles/block matrix multiplication. See the following link for how to do that reading/writing a matrix with a stride much larger than its width causes a big loss in performance

Upvotes: 1

Richard Vock
Richard Vock

Reputation: 1468

First of all: OpenMP and std::thread/future/etc. are different things. If you want to use OpenMP, there are a few quite good tutorials out there to look for, but it boils down to one preprocessor command in front of your first loop I guess.

Now to the c++11 part: I guess from your question (it was quite unclear in that regard), that you pass your function to run in a thread. This will not decrease any computation time since your code still runs in one thread. Now guess what the "multi" in "multithreading" means...

What you want to do everytime you write multi-threaded code is

  1. Think about how you can split your work into (ideally equally sized) disjoint problems. Disjoint here means that whatever you are computing does not rely on results of other computations. In your case note that the computation of a single result element of the matrix or a column/row can be computed independent of the others.

  2. Whatever such a "subcomputation" writes must be written to a location the other threads do not simultaneously write to. If this is necessary there are ways to work around that (mutexes for example), but often problems can be defined to be inherently independent in memory (as in your case each worker thread must only write in one column for example).

  3. Write a function that does such a subtask (for example restrict your function to compute only one column you pass as a parameter) and for all subtasks create std::thread or std::future objects (the latter one using std::async), pass them your subtask function with their corresponding params and wait for them to finish (using thread::join).

Please be aware that writing multithreaded code for less trivial problems in any not purely functional language can quickly become quite complex. You should probably take some time and read some tutorials or books. As a start maybe have a look into this youtube list: https://www.youtube.com/playlist?list=PL5jc9xFGsL8E12so1wlMS0r0hTQoJL74M

Oh and before I forget it: In your function you don't need to write to either a or b and should therefore pass them by const reference. On the thread construction site you'd then have to use std::cref. Const correctness is very important when writing multi-threaded code.

Upvotes: 0

Related Questions