klkblake
klkblake

Reputation: 377

image2d_t corrupted when passing to OpenCL kernel

I'm writing a pathtracer in Haskell and OpenCL, and I'm having an issue with passing an image2d_t to my kernel to write the output to. Namely, calling any of the get_image_* functions in OpenCL on the image2d_t returns nonsense values (usually either 0 or 2^24-1), and write_imagef does nothing. This only happens when running on the GPU -- the CPU runs it fine. Calling clGetImageInfo on the host returns the correct values. The Haskell bindings for OpenCL convert error codes to exceptions, so it's not a matter of forgetting to check for errors. clinfo reports my version as "OpenCL 1.2 AMD-APP (1084.2)". I should note that I experienced (and reported) multiple bugs causing the OpenCL compiler to segfault or fail to link, so this may be a result of that instead of a bug in my code.

I initialise OpenCL like this (hopefully should be relatively intelligible to people who don't know Haskell):

(platform:_) <- clGetPlatformIDs
(device:_) <- clGetDeviceIDs platform CL_DEVICE_TYPE_GPU
glContext <- glXGetCurrentContext
glDisplay <- glXGetCurrentDisplay
context <- clCreateContext [CL_GL_CONTEXT_KHR glContext, CL_GLX_DISPLAY_KHR glDisplay] [device] putStrLn
queue <- clCreateCommandQueue context device []
source <- readFile "pt.cl"
program <- clCreateProgramWithSource context source
clBuildProgram program [device] "-cl-strict-aliasing"
        `catch` (λe -> case (e :: CLError) of
                            CL_BUILD_PROGRAM_FAILURE -> putStrLn "Building OpenCL program failed:"
                                                     >> clGetProgramBuildLog program device >>= putStrLn
                                                     >> throw e
                            _                        -> return ())
kernel <- clCreateKernel program "sample"
pCorners <- mallocArray 4
buffer <- clCreateBuffer context [CL_MEM_READ_ONLY, CL_MEM_USE_HOST_PTR] (4*sizeOf (undefined :: V.Vec4F), castPtr pCorners)
clSetKernelArgSto kernel 1 buffer
tex@(TextureObject texid) <- head <$> (genObjectNames 1)
activeTexture $= TextureUnit 0
textureBinding Texture2D $= Just tex
textureFilter Texture2D $= ((Nearest, Nothing), Nearest)
textureWrapMode Texture2D S $= (Repeated, Clamp)
textureWrapMode Texture2D T $= (Repeated, Clamp)
texImage2D Nothing NoProxy 0 RGBA′ (TextureSize2D initialWidth initialHeight) 0 (PixelData RGBA UnsignedByte nullPtr)
image <- clCreateFromGLTexture2D context [CL_MEM_READ_WRITE] gl_TEXTURE_2D 0 texid
clSetKernelArgSto kernel 2 image

And I call this (slightly simplified) to run the kernel and render the result:

clSetKernelArgSto kernel 0 position
pokeArray pCorners orientedCorners -- update the pCorners array
finish -- This is glFinish()
clEnqueueAcquireGLObjects queue [image] []
clEnqueueNDRangeKernel queue kernel [width, height] [] []
clEnqueueReleaseGLObjects queue [image] []
clFinish queue
drawElements TriangleFan 4 UnsignedInt offset0
swapBuffers

Finally, the test kernel:

__kernel void sample(float3 position, __constant float3 corner[4], image2d_t output) {
        write_imagef(output, (int2)(get_global_id(0), get_global_id(1)), (float4)(0, 0.5f, 1, 1));
}

The output of this is a fullscreen quad displaying a random uninitialized area of GPU memory. It should be a fullscreen cyan quad. I had some printfs in there to display the results of the get_image_* functions, but they've started causing the program to hang.

Upvotes: 4

Views: 1528

Answers (2)

Gerhard J.
Gerhard J.

Reputation: 16

I had a similar problem. After reordering the kernel arguments, so all image2d_t arguments are the first arguments, it worked. Especally calling get_image_dim returned the right results. Don't know, if this is a bug. My GPU: ATI Radeon 7950.

Upvotes: 0

Thomas
Thomas

Reputation: 3381

The OpenCL specification has rules about this - an image2d_t object requires an access qualifier.

There are two such qualifiers:

  • read_only (or __read_only)
  • write_only (or __write_only)

They are mutually exclusive and may not be used together (so you cannot read and write to a texture at the same time - this is important if you intend to do accumulation work with your image, which I suspect is the case for a Monte-Carlo application such as path tracing). It is valid to omit the qualifier, as it will simply default to read_only, but this is unfortunately the wrong choice for an output image.

The solution is to just qualify your image argument with write_only, or if you need to read from it too, use some sort of swap system (or use a global memory buffer which can be read from and written to at the same time, but this makes CL/GL interop a more bit difficult, and you lose sampling features...).

The reason it works on the CPU is, I assume, because on the CPU there is no read-only "texture memory", so even if it's technically illegal to write to the image, it's possible and the runtime lets you do it. On the other hand, GPU's do have read-only memory sections, which cannot be written to while the kernel is running no matter how hard you try (or perhaps the runtime for your GPU device is just stricter).

*When I say runtime, I mean the OpenCL-enabled device, not your program, of course.

Upvotes: 2

Related Questions