Haydn V. Harach
Haydn V. Harach

Reputation: 1275

Why is my rendering thread taking up 100% cpu?

So right now in my OpenGL game engine, when my rendering thread has literally nothing to do, it's taking up the maximum for what my CPU can give it. Windows Task Manager shows my application taking up 25% processing (I have 4 hardware threads, so 25% is the maximum that one thread can take). When I don't start the rendering thread at all I get 0-2% (which is worrying on it's own since all it's doing is running an SDL input loop).

So, what exactly is my rendering thread doing? Here's some code:

Timer timer;

while (gVar.running)
{
   timer.frequencyCap(60.0);

   beginFrame();
   drawFrame();
   endFrame();
}

Let's go through each of those. Timer is a custom timer class I made using SDL_GetPerformanceCounter. timer.frequencyCap(60.0); is meant to ensure that the loop doesn't run more than 60 times per second. Here's the code for Timer::frequencyCap():

double Timer::frequencyCap(double maxFrequency)
{
    double duration;

    update();
    duration = _deltaTime;
    if (duration < (1.0 / maxFrequency))
    {
        double dur = ((1.0 / maxFrequency) - duration) * 1000000.0;
        this_thread::sleep_for(chrono::microseconds((int64)dur));
        update();
    }

    return duration;
}

void Timer::update(void)
{
    if (_freq == 0)
        return;

    _prevTicks = _currentTicks;
    _currentTicks = SDL_GetPerformanceCounter();

      // Some sanity checking here. //
      // The only way _currentTicks can be less than _prevTicks is if we've wrapped around to 0. //
      // So, we need some other way of calculating the difference.
    if (_currentTicks < _prevTicks)
   {
         // If we take difference between UINT64_MAX and _prevTicks, then add that to _currentTicks, we get the proper difference between _currentTicks and _prevTicks. //
      uint64 dif = UINT64_MAX - _prevTicks;

         // The +1 here prvents an off-by-1 error.  In truth, the error would be pretty much indistinguishable, but we might as well be correct. //
      _deltaTime = (double)(_currentTicks + dif + 1) / (double)_freq;
   }
   else
      _deltaTime = (double)(_currentTicks - _prevTicks) / (double)_freq;
}

The next 3 functions are considerably simpler (at this stage):

void Renderer::beginFrame()
{
      // Perform a resize if we need to. //
   if (_needResize)
   {
      gWindow.getDrawableSize(&_width, &_height);
      glViewport(0, 0, _width, _height);
      _needResize = false;
   }

   glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT | GL_STENCIL_BUFFER_BIT);
}

void Renderer::endFrame()
{
   gWindow.swapBuffers();
}

void Renderer::drawFrame()
{
}

The rendering thread was created using std::thread. The only explanation I can think of is that timer.frequencyCap somehow isn't working, except I use that exact same function in my main thread and I idle at 0-2%.

What am I doing wrong here?

Upvotes: 0

Views: 1048

Answers (2)

datenwolf
datenwolf

Reputation: 162164

If V-Sync is enabled and your program honors the the swap intervals, then you seeing your program taking up 100% is actually an artifact how Windows measures CPU time. It's been a long known issue, but anytime your program blocks in a driver context (which is what happens when OpenGL blocks on a V-Sync) windows will account this for the program actually consuming CPU time, while its actually just idling.

If you add a Sleep(1) right after swap buffers it will trick Windows into a more sane accounting; on some systems even a Sleep(0) does the trick.

Anyway, the 100% are just a cosmetic problem, most of the time.


In the past weeks I've done some exhaustive research on low latency rendering (i.e. minimizing the time between user input and corresponding photons coming out of the display), since I'm getting a VR headset soon. And here's what I found out regarding timing SwapBuffers: The sane solution to the problem is actually to time the frame rendering times and add an artificial sleep before SwapBuffers so that you wake up only a few ms before the V-Sync. However this is easier said than done because OpenGL is highly asynchronous and explicitly adding syncs will slow down your throughput.

Upvotes: 4

Spektre
Spektre

Reputation: 51845

if you have a complex scene or non optimized rendering

  • hit bottleneck somewhere or have an error in gl code
  • then framerate usually drops to around 20 fps (at least on NVidia) no matter the complexity of the scene
  • for very complex scenes even bellow that

try this:

  1. try to measure time it takes this to process

    beginFrame();
    drawFrame();
    endFrame();
    
    • there you will see your fps limit
    • compare it to scene complexity/HW capability
    • and decide if it is a bug or too complex scene
    • try to turn off some GL stuff
    • for example last week I discover that if I turn CULL_FACE off it actually speeds up one of mine non optimized rendering about 10-100 times which I don't get why till today (on old stuff GL code)
  2. check for GL errors

  3. I do not see any glFlush()/glFinish() in your code

    • try to measure with glFinish();
  4. If you cant sort this out you still can use dirty trick like

    • add Sleep(1); to your code
    • it will force to sleep your thread so it will never use 100% power
    • the time it sleeps is 1ms + scheduler granularity so it also limits the target fps
    • you use this_thread::sleep_for(chrono::microseconds((int64)dur));
    • do not know that function are you really sure it does what you think?

Upvotes: 1

Related Questions