OpenMP and memory bandwidth restriction

Question

Edit: My first code sample was wrong. Fixed with a simpler.

I implement a C++ library for algebraic operations between large vectors and matrices. I found on x86-x64 CPUs that OpenMP parallel vector additions, dot product etc are not going so faster than single threaded. Parallel operations are -1% - 6% faster than single threaded. This happens because of memory bandwidth limitation (I think).

So, the question is, is there real performance benefit for code like this:

void DenseMatrix::identity()
{
    assert(height == width);
    size_t i = 0;
    #pragma omp parallel for if (height > OPENMP_BREAK2)
    for(unsigned int y = 0; y < height; y++)
        for(unsigned int x = 0; x < width; x++, i++)
            elements[i] = x == y ? 1 : 0;
}

In this sample there is no serious drawback from using OpenMP. But if I am working on OpenMP with Sparse Vectors and Sparse Matrices, I cannot use for instance *.push_back(), and in that case, question becomes serious. (Elements of sparse vectors are not continuous like dense vectors, so parallel programming has a drawback because result elements can arrive anytime - not for lower to higher index)

OpenMP and memory bandwidth restriction

Answers (1)

Related Questions