Reputation: 293
I just started to develop with openCL, and I'm really confused by the result I get when I launch my kernel.
kernel void clTest1(read_only image2d_t input, write_only image2d_t output)
{
size_t x = get_global_id(0);
size_t y = get_global_id(1);
bool yIsEven = ((y % 2) == 0);
int modifiedY = 0;
if (yIsEven) {
modifiedY = y;
} else {
modifiedY = (y - 1);
}
printf("Original Y:%i isEven:%i Modified Y: %i", y, yIsEven, modifiedY);
write_imageui(output, (int2)(x,y), read_imageui(input, sampler, (int2)(x,modifiedY)));
}
If I look at the console log,
I get the following result:
Original Y:0 isEven:1 Modified Y: 0
Original Y:1 isEven:0 Modified Y: 0
Original Y:2 isEven:1 Modified Y: 1
Original Y:3 isEven:0 Modified Y: 2
Original Y:4 isEven:1 Modified Y: 3
Original Y:5 isEven:0 Modified Y: 4
Original Y:6 isEven:1 Modified Y: 5
Original Y:7 isEven:0 Modified Y: 6
Original Y:8 isEven:1 Modified Y: 7
Original Y:9 isEven:0 Modified Y: 8
Instead of
Original Y:0 isEven:1 Modified Y: 0
Original Y:1 isEven:0 Modified Y: 0
Original Y:2 isEven:1 Modified Y: 2
Original Y:3 isEven:0 Modified Y: 2
Original Y:4 isEven:1 Modified Y: 4
Original Y:5 isEven:0 Modified Y: 4
Original Y:6 isEven:1 Modified Y: 6
Original Y:7 isEven:0 Modified Y: 6
Original Y:8 isEven:1 Modified Y: 8
Original Y:9 isEven:0 Modified Y: 8
Thanks in advance.
Upvotes: 2
Views: 125
Reputation: 293
I fixed it by removing OpenCL optimisation in the project's build settings.
OPENCL_OPTIMIZATION_LEVEL = 0
Upvotes: 0
Reputation: 14619
When I built your code with slight edits (see below) using clang
, I got the LLVM IR below:
$ cat go.ll
; ModuleID = 'go.cl'
target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-nvcl"
; Function Attrs: noinline nounwind
define void @clTest1(i32* nocapture readnone %input, i32* nocapture %output) #0 {
%1 = tail call i64 @get_global_id(i32 0) #2
%2 = tail call i64 @get_global_id(i32 1) #2
%3 = and i64 %2, 1
%4 = icmp eq i64 %3, 0
%5 = add nsw i64 %2, 4294967295
%modifiedY.0.in = select i1 %4, i64 %2, i64 %5
%modifiedY.0 = trunc i64 %modifiedY.0.in to i32
%6 = getelementptr inbounds i32* %output, i64 %1
store i32 %modifiedY.0, i32* %6, align 4, !tbaa !3
ret void
}
So you can see that %modifiedY
gets the result of select(yIsEven, y, modifiedY)
. Note that the order of args is different between the OpenCL select
and LLVM IR select
.
Long story short: your code as written should work as described. If it doesn't, it's a bug in your OCL provider or environment somehow.
"slight edits":
$ diff -Naur go_orig.cl go.cl
--- go_orig.cl 2014-04-29 08:14:24.453488798 -0500
+++ go.cl 2014-04-29 08:15:01.657489768 -0500
@@ -1,5 +1,5 @@
-kernel void clTest1(read_only image2d_t input, write_only image2d_t output)
+kernel void clTest1(read_only int *input, write_only int *output)
{
long x = get_global_id(0);
long y = get_global_id(1);
@@ -12,7 +12,5 @@
modifiedY = (y - 1);
}
- printf("Original Y:%i isEven:%i Modified Y: %i", y, yIsEven, modifiedY);
-
- write_imageui(output, (int2)(x,y), read_imageui(input, sampler, (int2)(x,modifiedY)));
+ output[x] = modifiedY;
}
Upvotes: 0
Reputation: 8420
Can you try doing this:
printf("Original Y:%i isEven:%i Modified Y: %i", (int)y, (int)yIsEven, (int)modifiedY);
I don't really trust the vararg works ok for the GPU enviroment and bool types casting.
PD: In either way, is an Apple OCL problem, the results should be fine.
Upvotes: 0
Reputation: 9925
Your code is fine, and works for me. Given that it works for you on the CPU but not on the GPU, this would appear to be a bug in Apple's OpenCL implementation, which is not at all uncommon.
I recommend you raise a bug using the Apple Bug Tracking System.
Upvotes: 1