Reputation: 307
I have implement matrix class using two dimensional vectors in C++ (vector<vector<float>>()
). I now want to optimize the code using GPGPU using openCL. But i am runing in to problems every miniute. So Please help me and give me so tips to do this.
My requirements are follows
One of my code segments to do this as follows, here in my kernal I try to add 10 to every element.
But output shows that it only change the values in frist vector[0][n] elemets.
This is the segment in my host program....
int in_vec_size = 100;
int out_vec_size = 100;
vector<vector<float>> in_vec(10,vector<float>(10));
vector<vector<float>> out_vec(10, vector<float>(10));
int k = 0;
//initialize the input vec
for (int i=0; i < 10;i++)
{
for (int j = 0; j < 10;j++)
{
in_vec[i][j] = k++;
out_vec[i][j] = 0;
}
}
//creating bufferes
cl::Buffer inBuff(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, in_vec_size*4, &in_vec[0][0]);
cl::Buffer outBuff(context, CL_MEM_WRITE_ONLY, out_vec_size*4, NULL);
//set kernal args
kernal.setArg(0, inBuff);
kernal.setArg(1, outBuff);
kernal.setArg(2, in_vec_size);
cl::CommandQueue queue(context, devices_gpu[0]);
queue.enqueueTask(kernal);
queue.enqueueWriteBuffer(inBuff, CL_TRUE, 0, in_vec_size*4, &in_vec[0][0]);
queue.enqueueReadBuffer(outBuff, CL_TRUE, 0, out_vec_size*4, &out_vec[0][0]);
for (int i = 0; i < 10; i++)
{
for (int j = 0; j < 10; j++)
{
cout << out_vec[i][j] << endl;
}
}
__kernel void add(__global float*in,__global float*out,int x)
{
// i=get_global_id(0);
for(int i=0;i<x;i++)
{
out[i] = in[i]+10;
}
}
Upvotes: 0
Views: 1369
Reputation: 1814
You're using multdimensonal vector.
It means, that outer vector contains inner vectors in continuous fashion. But content is a class, not a plain data. So, data, which you initialize OpenCL memory objects with, isn't continuous. You initialize cl_mem with insides of vector class implementation data, not matrix data.
Use single vector of size MxN instead. Take a look at this SO questionon.
Upvotes: 4