user818117
user818117

Reputation: 420

Change in apple metal kernel does not change execution time

I modified the kernel code in the above project https://developer.apple.com/documentation/metal/basic_tasks_and_concepts/performing_calculations_on_a_gpu?preferredLanguage=occ

from this


kernel void add_arrays(device const float* inA,
                       device const float* inB,
                       device float* result,
                       uint index [[thread_position_in_grid]])
{
    // the for-loop is replaced with a collection of threads, each of which
    // calls this function.
     result[index] = inA[index] + inB[index] ;
    
}

to this just embedding same calculation inside a for loop


kernel void add_arrays(device const float* inA,
                       device const float* inB,
                       device float* result,
                       uint index [[thread_position_in_grid]])
{
    // the for-loop is replaced with a collection of threads, each of which
    // calls this function.
    for(int i=0;i<1000000;i++){ // added
        result[index] = inA[index] + inB[index] ;
    }
}

but the execution time of the program does not change , am I doing something wrong

Upvotes: 0

Views: 54

Answers (1)

user818117
user818117

Reputation: 420

as @frank-schlegel suggested I firstly added

for(int i=0;i<1000000;i++){
        result[index] = inA[index] + inB[index] + i;
}

it did not change the performance but when I added below code

for(int i=0;i<1000000;i++){
        result[index] +=  i;
}

It significantly did, My conclusion is Metal compiler probably optimizes code super intelligently!

Upvotes: 1

Related Questions