How can I accurately measure and compare OpenCL speed for simple for loop function?

Question

I have recently implemented (Tested) OpenCL using a Struct to carry and update a C++ class object using a simple function written to the kernel and found to my dismay that the same function when processed without the kernel using a simple for loop was in fact faster.

Here is the kernel function :

 __kernel void function_x_y_(__global myclass_* input,long n)
{

int gid = get_global_id(0);
if(gid



Here is the for loop :

for(int i=0;i<100;i++){
thisclass[i].function_x_y();
}


and the class function :

void function_x_y(){

valuez = valuex * valuey;

}


I ran a clock on both process :

cout<<"Run function in serial
";
startTime = clock();
for(int i=0;i<100;i++){
thisclass[i].function_x_y();
}
endTime = clock();
cout << "It took (serial) " << (endTime -startTime) / (CLOCKS_PER_SEC / 1000000) << " ms. " << endl;


cout<<"Run function in parallel using struct to write to object
";
init_ocl();
startTime = clock();
load_kernel_from_struct("function_x_y_",p_struct,100);      //Loads function and variables into opencl

endTime = clock();
cout << "It took (parallel) " << (endTime -startTime) / (CLOCKS_PER_SEC / 1000000 ) << " ms. " << endl;


With the output:

Run function in serial
It took (serial) 5 ms. 
Run function in parallel using struct to write to object
It took (parallel) 159010 ms. 


I am using the cl-helper.c by Andreas Kloecker

I dont understand this it should be faster. Any help or advice is welcome.

Is there a more accurate speed test? Could this be due to the fact that it takes time to initialise assign memory and transfer the data to the kernel?

There must be a way to ensure that this works faster could it be that I must transfer and initialise everything before running the function?

Thanks,
Hbyte.

How can I accurately measure and compare OpenCL speed for simple for loop function?

Answers (1)

Related Questions