The Dude
The Dude

Reputation: 59

Parallel ray tracing in 16x16 chunks

My ray tracer is currently multi threaded, I'm basically dividing the image into as many chunks as the system has and rendering them parallel. However, not all chunks have the same rendering time, so most of the time half of the run time is only 50% cpu usage.

Code

std::shared_ptr<bitmap_image> image = std::make_shared<bitmap_image>(WIDTH, HEIGHT);
    auto nThreads = std::thread::hardware_concurrency();

    std::cout << "Resolution: " << WIDTH << "x" << HEIGHT << std::endl;
    std::cout << "Supersampling: " << SUPERSAMPLING << std::endl;
    std::cout << "Ray depth: " << DEPTH << std::endl;
    std::cout << "Threads: " << nThreads << std::endl;

    std::vector<RenderThread> renderThreads(nThreads);
    std::vector<std::thread> tt;

    auto size = WIDTH*HEIGHT;

    auto chunk = size / nThreads;
    auto rem = size % nThreads;

    //launch threads
    for (unsigned i = 0; i < nThreads - 1; i++)
    {
        tt.emplace_back(std::thread(&RenderThread::LaunchThread, &renderThreads[i], i * chunk, (i + 1) * chunk, image));
    }
    tt.emplace_back(std::thread(&RenderThread::LaunchThread, &renderThreads[nThreads-1], (nThreads - 1)*chunk, nThreads*chunk + rem, image));

for (auto& t : tt)
        t.join();

I would like to divide the image into 16x16 chunks or something similar and render them paralelly, so after each chunk gets rendered, the thread switches to the next and so on... This would greatly increase cpu usage and run time.

How do I set up my ray tracer render these 16x16 chunks in a multithreaded manner?

Upvotes: 0

Views: 458

Answers (2)

Spektre
Spektre

Reputation: 51845

I do this a bit differently:

  1. obtain number of CPU and or cores

    You did not specify OS so you need to use your OS api for this. search for System affinity mask.

  2. divide screen into threads

    I am dividing screen by lines instead of 16x16 blocks so I do not need to have a que or something. Simply create thread for each CPU/core that will process only its horizontal lines rays. That is simple so each thread should have its ID number counting from zero and number of CPU/cores n so lines belonging to each process are:

    y = ID + i*n
    

    where i={0,1,2,3,... } once y is bigger or equal then screen resolution stop. This type of access has its advantages for example accessing screen buffer via ScanLines will not be conflicting between threads as each thread access only its lines...

    I am also setting affinity mask for each thread so it uses its own CPU/core only it give me a small boost so there is not so much process switching (but that was on older OS versions hard to say what it does now).

  3. synchronize threads

    basically you should wait until all threads are finished. if they are then render the result on screen. Your threads can either stop and you will create new ones on next frame or jump to Sleep loops until rendering forced again...

    I am using the latter approach so I do not need to create and configure the threads over and over again but beware Sleep(1) can sleep a lot more then just 1 ms.

Upvotes: 0

Adrian McCarthy
Adrian McCarthy

Reputation: 47962

I assume the question is "How to distribute the blocks to the various threads?"

In your current solution, you're figuring out the regions ahead of time and assigning them to the threads. The trick is to turn this idea on its head. Make the threads ask for what to do next whenever they finish a chunk of work.

Here's an outline of what the threads will do:

void WorkerThread(Manager *manager) {
  while (auto task = manager->GetTask()) {
    task->Execute();
  }
}

So you create a Manager object that returns a chunk of work (in the form of a Task) each time a thread calls its GetTask method. Since that method will be called from multiple threads, you have to be sure it uses appropriate synchronization.

std::unique_ptr<Task> Manager::GetTask() {
    std::lock_guard guard(mutex);
    std::unique_ptr<Task> t;
    if (next_row < HEIGHT) {
        t = std::make_unique<Task>(next_row);
        ++next_row;
    }
    return t;
}

In this example, the manager creates a new task to ray trace the next row. (You could use 16x16 blocks instead of rows if you like.) When all the tasks have been issued, it just returns an empty pointer, which essentially tells the calling thread that there's nothing left to do, and the calling thread will then exit.

If you made all the Tasks in advance and had the manager dole them as they are requested, this would be a typical "work queue" solution. (General work queues also allow new Tasks to be added on the fly, but you don't need that feature for this particular problem.)

Upvotes: 1

Related Questions