dll
dll

Reputation: 161

How to do a matrix multiplication using <threads> and a 1-D array in c++?

I am trying to do a matrix multiplication using threads. But I do not get correct values. Since the matrix can be very large I use the heap memory. The matrix is thus stored in a 1-D array.

The matrix is always a square matrix thus the number of rows and the number of columns is equal to the square root of the array length. If the array length is 16 then the numbers of rows is 4 and the number of columns is also 4.

I can not use an std::vector so that's why the use of std::unique_ptr.

There are 4 threads and each of them receives 1/4 of the original array to work with. This doesn't work due to the nature of matrix multiplication and I can't seem to find the right solution. How can I split the task for 4 threads?

auto matrixmultiplication(float* &array1, float* &array2, int arrayLength) {
    unique_ptr<float[]> arrayOut(new float[arrayLength]);
    auto numberOfThreads = 4;
    auto widthMatrix = (int)sqrt(arrayLength);
    auto elementsPerThread = (int)sqrt(arrayLength / numberOfThreads);

    auto mul = [](auto* array1, auto* array2, auto* array3, auto dimension) {
        for (auto x = 0; x < dimension; x++) {
            for (auto y = 0; y < dimension; y++) {
                array3[dimension * x + y] = 0;
                for (auto z = 0; z < dimension; z++) {
                    array3[dimension * x + y] += array1[dimension * x + z] * array2[dimension * z + y];
                }
            }
        }
    };

    vector<thread> threads;
    for (auto i = 0; i < numberOfThreads; i++) {
        threads.push_back(
            thread(
                mul,
                array1 + i * elementsPerThread,
                array2,
                arrayOut.get() + i * elementsPerThread,
                elementsPerThread
            )
        );
    }
    for (auto &thread : threads) {
        thread.join();
    }
    return arrayOut;
};

Upvotes: 2

Views: 1353

Answers (2)

Akira
Akira

Reputation: 4473

For all threads I would start the processing from consecutive rows of the first matrix , i.e. the 0th thread will process the 0th row, the 1st will process 1st row, and so on to the nth thread.

After a thread processed a row it has to jump to the next row by the number of threads, i.e. if I have 2 threads, after the 0th finished processig the 0th row, it will jump to the 2nd row and process it.

Let's see it in a working example:

#include <iostream>
#include <memory>
#include <vector>
#include <thread>

// multiplies the specified row and column from specified matrices
void multiply(const int* m_1, const int* m_2,
        std::size_t size, std::size_t row, std::size_t col, int* m_res) {
    for(std::size_t i = 0; i < size; ++i)
        m_res[row * size + col] += m_1[row * size + i] * m_2[i * size + col];
}

int main() {
    constexpr int N = 3, THREAD_NUM = 2;

    // matrices to multiply and a matrix for result
    std::unique_ptr<int[]> A(new int[N * N] {
        11, 12, 13, 21, 22, 23, 31, 32, 33
    });
    std::unique_ptr<int[]> B(new int[N * N] {
        1, 0, 0, 0, 1, 0, 0, 0, 1
    });
    std::unique_ptr<int[]> C(new int[N * N] {});

    // create vector for running threads then assign threads to its elements
    std::vector<std::thread> thread_group(THREAD_NUM);

    for(int thread_i = 0; thread_i < THREAD_NUM; ++thread_i)
        thread_group[thread_i] = std::thread([&, thread_i]() {

            // each thread stars from consecutive rows then steps by 
            // the number of threads
            for(int row = thread_i; row < N; row += THREAD_NUM) {
                for(int col = 0; col < N; ++col)
                    multiply(A.get(), B.get(), N, row, col, C.get());
            }
        });

    for(auto& t : thread_group)
        t.join();

    // show the result
    for(int i = 0; i < N; ++i) {
        for(int j = 0; j < N; ++j)
            std::cout << (j ? "\t" : "") << C[i * N + j];
        std::cout << std::endl;
    }
}

Upvotes: 1

KjMag
KjMag

Reputation: 2770

If you have two matrices that you want to multiply, let's call them A and B, you just need to split the matrix A row-wise into 4 parts and pass the parts to corresponding threads. When it comes to matrix B, you need to pass a reference to the whole matrix to each thread as you need all its elements to calculate each row of A*B. This is going to be thread-safe as you are going to only read from the matrix B without modyfing it.

Upvotes: 1

Related Questions