Reputation: 2159
I compiled and ran the below code with intel dpc++ compiler. I am getting right result when using cpu selector but gpu selector gives garbage value.
All that my code does is an array named data is intialised with all 1's. In sycl kernel an accessor to this array is multiplied by 3 and saved to a result array. I try to print values in the result array which is expected to have all 3's but I am getting junk values.
As I was getting junk values when executing code on gpu. I tried running on cpu selector , here the code works without issues.
I tried this on linux and windows. compiler version dpcpp 2021.3
#include "iostream"
#include<CL/sycl.hpp>
#include <array>
using namespace std;
using namespace sycl;
int main() {
sycl::gpu_selector selector;
//using cpu selector as in the line below works
//sycl::cpu_selector selector;
sycl::queue q = sycl::queue(selector);
std::cout << q.get_device().get_info<sycl::info::device::name>();
constexpr int size = 3;
std::array<int, size> data{1,1,1};
std::array<int, size> resultarray;
range<1> num_items{ size };
buffer<int, 1> data_buff(data.data(), num_items);
buffer<int, 1> result(resultarray.data(), num_items);
q.submit([&](sycl::handler& cgh)
{
auto dataAccess = data_buff.get_access<access::mode::read_write>(cgh);
auto resultAccess = result.get_access<access::mode::write>(cgh);
cgh.parallel_for(num_items, [=](id<1> i)
{
resultAccess[i] = dataAccess[i] * 3;
});
}).wait();
std::cout <<"||"<< resultarray[0]<<"||"; //expected result ||3||
}
Can somebody help why the code would be giving wrong result on GPU?
Upvotes: 1
Views: 625
Reputation: 591
You are not triggering a copy back to host. Presumably on CPU, your SYCL implementation just decides to operate directly on the input pointer, so you don't see the problem.
Think about this: How could the SYCL implementation know that resultarray
is being used in your cout
and that data has to be copied back? It cannot, because this memory access does not go through any SYCL construct. Therefore it cannot know that it has to copy data back. The wait()
only causes the host to wait until the kernel has completed, it does not trigger copies.
The most important ways of triggering the necessary copy are:
{ }
should suffice, since then the buffer would go out of scope before your cout
, and the write back is triggered.host_accessor
instead of directly accessing resultarray
handler::copy()
Upvotes: 2