Parallelism vs Threading - Performance

Question

I have been reading on the subject, but I haven't been able to find a concrete answer to my question. I am interested in using parallelism/multithreading to improve the performance of my game, but I have heard some contradicting facts. For example, that multithreading may not produce any improvement on the execution speed for a game. I

I have thought of two ways to do this:

putting the rendering component into a thread. There are some things I would need to change, but I have a good idea of what needs to be done.
using openMP to parallelize the rendering function. I have already code to do so, thus this might be easier option.

This being an Uni assessment, the target hardware are my Uni's computers, which are multi-core (4 cores), and therefore I am hoping to achieve some additional efficiency using either one of those techniques.

My question, is therefore, the following: Which one should I prefer? Which normally produces the best results?

EDIT: The main function I mean to parallelize/multithread away:

void Visualization::ClipTransBlit ( int id, Vector2i spritePosition, FrameData frame, View *view )
{
    const Rectangle viewRect = view->GetRect ();
    BYTE *bufferPtr = view->GetBuffer ();

    Texture *txt = txtMan_.GetTexture ( id );
    Rectangle clippingRect = Rectangle ( 0, frame.frameSize.x, 0, frame.frameSize.y );

    clippingRect.Translate ( spritePosition );
    clippingRect.ClipTo ( viewRect );
    Vector2i negPos ( -spritePosition.x, -spritePosition.y );
    clippingRect.Translate ( negPos );

    if ( spritePosition.x < viewRect.left_ ) { spritePosition.x = viewRect.left_; }
    if ( spritePosition.y < viewRect.top_ ) { spritePosition.y = viewRect.top_; }

    if (clippingRect.GetArea() == 0) { return; }

    //clippingRect.Translate ( frameData );

    BYTE *destPtr = bufferPtr + ((abs(spritePosition.x) - abs(viewRect.left_)) + (abs(spritePosition.y) - abs(viewRect.top_)) * viewRect.Width()) * 4; // corner position of the sprite (top left corner)
    BYTE *tempSPtr = txt->GetData() + (clippingRect.left_ + clippingRect.top_ * txt->GetSize().x) * 4;

    int w = clippingRect.Width();
    int h = clippingRect.Height();
    int endOfLine = (viewRect.Width() - w) * 4;
    int endOfSourceLine = (txt->GetSize().x - w) * 4;

    for (int i = 0; i < h; i++)
    {
        for (int j = 0; j < w; j++)
        {
            if (tempSPtr[3] != 0)
            {
                memcpy(destPtr, tempSPtr, 4);
            }

            destPtr += 4;
            tempSPtr += 4;
        }

        destPtr += endOfLine;
        tempSPtr += endOfSourceLine;
    }

}

gordy · Accepted Answer

instead of calling memcpy for each pixel consider just setting the value there. the overhead in calling a function that many times could be dominating the overall execution time for this loop. E.g:

for (int i = 0; i < h; i++)
{
    for (int j = 0; j < w; j++)
    {
        if (tempSPtr[3] != 0)
        {
            *((DWORD*)destPtr) = *((DWORD*)tempSPtr);
        }

        destPtr += 4;
        tempSPtr += 4;
    }

    destPtr += endOfLine;
    tempSPtr += endOfSourceLine;
}

you could also avoid the conditional by employing one of the tricks mentioned here avoiding conditionals - in such a tight loop conditionals can be very expensive.

edit - as to whether it's better to run several instances of ClipTransBlit concurrently or to parallelize ClipTransBlit internally, I would say generally speaking it's better to implement parallelization at as high a level as possible to reduce the overhead you incur by setting it up (creating threads, synchronizing them, etc.)

In your case though because it looks like you're drawing sprites, if they were to overlap then without additional synchronization your high level threading might lead to nasty visual artifacts and even a race condition on checking the alpha bit. In that case the low level parallelism might be a better choice.

Parallelism vs Threading - Performance

Answers (2)

Related Questions