Reputation: 420
I modified the kernel code in the above project https://developer.apple.com/documentation/metal/basic_tasks_and_concepts/performing_calculations_on_a_gpu?preferredLanguage=occ
from this
kernel void add_arrays(device const float* inA,
device const float* inB,
device float* result,
uint index [[thread_position_in_grid]])
{
// the for-loop is replaced with a collection of threads, each of which
// calls this function.
result[index] = inA[index] + inB[index] ;
}
to this just embedding same calculation inside a for loop
kernel void add_arrays(device const float* inA,
device const float* inB,
device float* result,
uint index [[thread_position_in_grid]])
{
// the for-loop is replaced with a collection of threads, each of which
// calls this function.
for(int i=0;i<1000000;i++){ // added
result[index] = inA[index] + inB[index] ;
}
}
but the execution time of the program does not change , am I doing something wrong
Upvotes: 0
Views: 54
Reputation: 420
as @frank-schlegel suggested I firstly added
for(int i=0;i<1000000;i++){
result[index] = inA[index] + inB[index] + i;
}
it did not change the performance but when I added below code
for(int i=0;i<1000000;i++){
result[index] += i;
}
It significantly did, My conclusion is Metal compiler probably optimizes code super intelligently!
Upvotes: 1