user2725937
user2725937

Reputation: 11

OpenCL/GL Interop slow on nvidia/win but fast on linux?

Below problem is fixed in nvidia's new driver release 331.xx, currently available as beta driver.

Thanks for all your comments!

I have a multi-platform application that does many fragment operations and gpgpu stuff on OpenGL textures. The application makes heavy use of GL/CL interop, each texture may be bound to an OpenCL image and manipulated using CL kernels.

The problem is, the application runs fast on AMD cards, both Linux and Windows. On NVIDIA cards, it runs fast on Linux, but very slowly on Windows 7. Problem seems to be enqueueAcquireGLObjects and enqueueReleaseGLObjects. I have created a minimal sample, demonstrating the bad performance by simply:

  1. Creating 2 OpenGL textures (1600x1200 pixel, RGBA float)
  2. Creating 2 OpenCL images, sharing the 2 textures
  3. repeatedly (50 times) acquire, release, finish

Results (mean time for executing acquire, release, finish)

I have tried several different drivers from nvidia, from older 295.73 to current beta drivers 326.80, all showing the same behaviour.

My question now is, is the nvidia driver seriously broken or am I doing something wrong here? The code runs fast on linux, so it cant be a general problem with nvidia support for OpenCL. The code runs fast on AMD+Win, so it can not be a problem with my code being not optimized for Windows. Optimizing the code by, for example, changing the cl images to read/write-only is senseless, since performance hit is almost factor 30!

Below you can find the relevant code of my test case, I could provide full source code, too.

relevant code for context creation

{ // initialize GLEW
  glewInit();
}

{ // initialize CL Context, sharing GL Contet
  std::vector<cl::Platform> platforms;
  cl::Platform::get(&platforms);
  cl_context_properties cps[] = {
             CL_GL_CONTEXT_KHR,(cl_context_properties)wglGetCurrentContext(),
             CL_WGL_HDC_KHR,(cl_context_properties)wglGetCurrentDC(),
             CL_CONTEXT_PLATFORM, (cl_context_properties)(platforms[0]()),
             0};
  std::vector<cl::Device> devices;
  platforms[0].getDevices((cl_device_type)CL_DEVICE_TYPE_GPU, &devices);
  context_ = new cl::Context(devices, cps, NULL, this);
  queue_ = new cl::CommandQueue(*context_, devices[0]);
}

relevant code for creating textures and sharing CL images

width_ = 1600;
height_ = 1200;

float *data = new float[ 1600*1200*4 ];

textures_.resize(2);
glGenTextures(2, textures_.data());

for (int i=0;i<2;i++) {
  glBindTexture(GL_TEXTURE_2D, textures_[i]);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
  // "data" pointer holds random/uninitialized data, do not care in this example
  glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F_ARB, width_,height_, 0, GL_RGBA, GL_FLOAT, data);
}

delete data;
{ // create shared CL Images
#ifdef CL_VERSION_1_2
  clImages_.push_back(cl::ImageGL(*context_, CL_MEM_READ_WRITE, GL_TEXTURE_2D, 0, textures_[0]));
  clImages_.push_back(cl::ImageGL(*context_, CL_MEM_READ_WRITE, GL_TEXTURE_2D, 0, textures_[1]));
#else
  clImages_.push_back(cl::Image2DGL(*context_, CL_MEM_READ_WRITE, GL_TEXTURE_2D, 0, textures_[0]));
  clImages_.push_back(cl::Image2DGL(*context_, CL_MEM_READ_WRITE, GL_TEXTURE_2D, 0, textures_[1]));
#endif
}

relevant code for one acquire, release, finish cycle

try {
  queue_->enqueueAcquireGLObjects( &clImages_ );
  queue_->enqueueReleaseGLObjects( &clImages_ );
  queue_->finish();
} catch (cl::Error &e) {
  std::cout << e.what() << std::endl;
}

Upvotes: 1

Views: 1714

Answers (1)

CaptainObvious
CaptainObvious

Reputation: 2565

I'm gonna make the assumption that since you are using OpenGL, you display something on the screen after the OCL computation.

So based on that assumption, my first thought would be to check in the NVIDIA control panel if the VSync is enable and if yes to disable it and retest.

As far as I recall, the default options regarding vsync are different for AMD and NVIDIA; which would explain the difference between the two GPUs.

Just in case, here is a post that explain how vsync can slow down the rendering.

Upvotes: 1

Related Questions