vtruant
vtruant

Reputation: 293

Why does OpenCL apply code placed in the else part of my condition when the condition result is true

I just started to develop with openCL, and I'm really confused by the result I get when I launch my kernel.

kernel void clTest1(read_only image2d_t input, write_only image2d_t output)
{
    size_t  x           = get_global_id(0);
    size_t  y           = get_global_id(1);
    bool    yIsEven     = ((y % 2) == 0);
    int     modifiedY   = 0;

    if (yIsEven) {
    modifiedY = y;
    } else {
    modifiedY = (y - 1);
    }

    printf("Original Y:%i isEven:%i Modified Y: %i", y, yIsEven, modifiedY);

    write_imageui(output, (int2)(x,y), read_imageui(input, sampler, (int2)(x,modifiedY)));
}

If I look at the console log,

I get the following result:
Original Y:0 isEven:1 Modified Y: 0
Original Y:1 isEven:0 Modified Y: 0
Original Y:2 isEven:1 Modified Y: 1
Original Y:3 isEven:0 Modified Y: 2
Original Y:4 isEven:1 Modified Y: 3
Original Y:5 isEven:0 Modified Y: 4
Original Y:6 isEven:1 Modified Y: 5
Original Y:7 isEven:0 Modified Y: 6
Original Y:8 isEven:1 Modified Y: 7
Original Y:9 isEven:0 Modified Y: 8

Instead of
Original Y:0 isEven:1 Modified Y: 0
Original Y:1 isEven:0 Modified Y: 0
Original Y:2 isEven:1 Modified Y: 2
Original Y:3 isEven:0 Modified Y: 2
Original Y:4 isEven:1 Modified Y: 4
Original Y:5 isEven:0 Modified Y: 4
Original Y:6 isEven:1 Modified Y: 6
Original Y:7 isEven:0 Modified Y: 6
Original Y:8 isEven:1 Modified Y: 8
Original Y:9 isEven:0 Modified Y: 8

Thanks in advance.

Upvotes: 2

Views: 125

Answers (4)

vtruant
vtruant

Reputation: 293

I fixed it by removing OpenCL optimisation in the project's build settings.

OPENCL_OPTIMIZATION_LEVEL = 0

Upvotes: 0

Brian Cain
Brian Cain

Reputation: 14619

When I built your code with slight edits (see below) using clang , I got the LLVM IR below:

$ cat go.ll 
; ModuleID = 'go.cl'
target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-nvcl"

; Function Attrs: noinline nounwind
define void @clTest1(i32* nocapture readnone %input, i32* nocapture %output) #0 {
  %1 = tail call i64 @get_global_id(i32 0) #2
  %2 = tail call i64 @get_global_id(i32 1) #2
  %3 = and i64 %2, 1
  %4 = icmp eq i64 %3, 0
  %5 = add nsw i64 %2, 4294967295
  %modifiedY.0.in = select i1 %4, i64 %2, i64 %5
  %modifiedY.0 = trunc i64 %modifiedY.0.in to i32
  %6 = getelementptr inbounds i32* %output, i64 %1
  store i32 %modifiedY.0, i32* %6, align 4, !tbaa !3
  ret void
}

So you can see that %modifiedY gets the result of select(yIsEven, y, modifiedY). Note that the order of args is different between the OpenCL select and LLVM IR select.

Long story short: your code as written should work as described. If it doesn't, it's a bug in your OCL provider or environment somehow.

"slight edits":

$ diff -Naur go_orig.cl go.cl 
--- go_orig.cl  2014-04-29 08:14:24.453488798 -0500
+++ go.cl   2014-04-29 08:15:01.657489768 -0500
@@ -1,5 +1,5 @@

-kernel void clTest1(read_only image2d_t input, write_only image2d_t output)
+kernel void clTest1(read_only int *input, write_only int *output)
 {
     long  x           = get_global_id(0);
     long  y           = get_global_id(1);
@@ -12,7 +12,5 @@
         modifiedY = (y - 1);
     }

-    printf("Original Y:%i isEven:%i Modified Y: %i", y, yIsEven, modifiedY);
-
-    write_imageui(output, (int2)(x,y), read_imageui(input, sampler, (int2)(x,modifiedY)));
+    output[x] = modifiedY;
 }

Upvotes: 0

DarkZeros
DarkZeros

Reputation: 8420

Can you try doing this:

 printf("Original Y:%i isEven:%i Modified Y: %i", (int)y, (int)yIsEven, (int)modifiedY);

I don't really trust the vararg works ok for the GPU enviroment and bool types casting.

PD: In either way, is an Apple OCL problem, the results should be fine.

Upvotes: 0

jprice
jprice

Reputation: 9925

Your code is fine, and works for me. Given that it works for you on the CPU but not on the GPU, this would appear to be a bug in Apple's OpenCL implementation, which is not at all uncommon.

I recommend you raise a bug using the Apple Bug Tracking System.

Upvotes: 1

Related Questions