MKII
MKII

Reputation: 911

Parallelism vs Threading - Performance

I have been reading on the subject, but I haven't been able to find a concrete answer to my question. I am interested in using parallelism/multithreading to improve the performance of my game, but I have heard some contradicting facts. For example, that multithreading may not produce any improvement on the execution speed for a game. I

I have thought of two ways to do this:

This being an Uni assessment, the target hardware are my Uni's computers, which are multi-core (4 cores), and therefore I am hoping to achieve some additional efficiency using either one of those techniques.

My question, is therefore, the following: Which one should I prefer? Which normally produces the best results?

EDIT: The main function I mean to parallelize/multithread away:

void Visualization::ClipTransBlit ( int id, Vector2i spritePosition, FrameData frame, View *view )
{
    const Rectangle viewRect = view->GetRect ();
    BYTE *bufferPtr = view->GetBuffer ();

    Texture *txt = txtMan_.GetTexture ( id );
    Rectangle clippingRect = Rectangle ( 0, frame.frameSize.x, 0, frame.frameSize.y );

    clippingRect.Translate ( spritePosition );
    clippingRect.ClipTo ( viewRect );
    Vector2i negPos ( -spritePosition.x, -spritePosition.y );
    clippingRect.Translate ( negPos );

    if ( spritePosition.x < viewRect.left_ ) { spritePosition.x = viewRect.left_; }
    if ( spritePosition.y < viewRect.top_ ) { spritePosition.y = viewRect.top_; }

    if (clippingRect.GetArea() == 0) { return; }

    //clippingRect.Translate ( frameData );

    BYTE *destPtr = bufferPtr + ((abs(spritePosition.x) - abs(viewRect.left_)) + (abs(spritePosition.y) - abs(viewRect.top_)) * viewRect.Width()) * 4; // corner position of the sprite (top left corner)
    BYTE *tempSPtr = txt->GetData() + (clippingRect.left_ + clippingRect.top_ * txt->GetSize().x) * 4;

    int w = clippingRect.Width();
    int h = clippingRect.Height();
    int endOfLine = (viewRect.Width() - w) * 4;
    int endOfSourceLine = (txt->GetSize().x - w) * 4;

    for (int i = 0; i < h; i++)
    {
        for (int j = 0; j < w; j++)
        {
            if (tempSPtr[3] != 0)
            {
                memcpy(destPtr, tempSPtr, 4);
            }

            destPtr += 4;
            tempSPtr += 4;
        }

        destPtr += endOfLine;
        tempSPtr += endOfSourceLine;
    }

}

Upvotes: 2

Views: 266

Answers (2)

gordy
gordy

Reputation: 9796

instead of calling memcpy for each pixel consider just setting the value there. the overhead in calling a function that many times could be dominating the overall execution time for this loop. E.g:

for (int i = 0; i < h; i++)
{
    for (int j = 0; j < w; j++)
    {
        if (tempSPtr[3] != 0)
        {
            *((DWORD*)destPtr) = *((DWORD*)tempSPtr);
        }

        destPtr += 4;
        tempSPtr += 4;
    }

    destPtr += endOfLine;
    tempSPtr += endOfSourceLine;
}

you could also avoid the conditional by employing one of the tricks mentioned here avoiding conditionals - in such a tight loop conditionals can be very expensive.

edit - as to whether it's better to run several instances of ClipTransBlit concurrently or to parallelize ClipTransBlit internally, I would say generally speaking it's better to implement parallelization at as high a level as possible to reduce the overhead you incur by setting it up (creating threads, synchronizing them, etc.)

In your case though because it looks like you're drawing sprites, if they were to overlap then without additional synchronization your high level threading might lead to nasty visual artifacts and even a race condition on checking the alpha bit. In that case the low level parallelism might be a better choice.

Upvotes: 2

worldterminator
worldterminator

Reputation: 3076

Theoretically, they should produce the same effect. In practice, it might be quite different.

If you print out assembly code of an OpenMP program, OpenMP simply calls some function in the scope like #pragma omp parallel .... It is similar to folk.

OpenMP is parallel computing oriented, on the other hand, multi-thread is more general. For example, if you want to write a GUI program, multithreading is necessary(Some frameworks may hide it. It still needs multiple threads). However you never want to implement it with OpenMP.

Upvotes: 0

Related Questions