Mixalis Navridis
Mixalis Navridis

Reputation: 329

Java Benchmarking addition of arrays with both CPU and GPU and compare performance

I am trying to compare a simple addition task with both CPU and GPU, but the results that I get are so weird.

First of all, let me explain how I managed to run the GPU task.

Let's dive into code now this is my code it simply

package gpu;
import com.aparapi.Kernel;
import com.aparapi.Range;


public class Try {
    public static void main(String[] args) {

        final int size = 512;
        final float[] a = new float[size];
        final float[] b = new float[size];

        for (int i = 0; i < size; i++) {
            a[i] = (float) (Math.random() * 100);
            b[i] = (float) (Math.random() * 100);
        }


        //##############CPU-TASK########################
        long start = System.nanoTime();
        final float[] sum = new float[size];
        for(int i=0;i<size;i++){
            sum[i] = a[i] + b[i];
        }
        long finish = System.nanoTime();
        long timeElapsed = finish - start;
        //######################################



        //##############GPU-TASK########################
        final float[] sum2 = new float[size];
        Kernel kernel = new Kernel(){
            @Override public void run() {
                int gid = getGlobalId();
                sum2[gid] = a[gid] + b[gid];
            }
        };

        long start1 = System.nanoTime();
        kernel.execute(Range.create(size));
        long finish2 = System.nanoTime();
        long timeElapsed2 = finish2 - start1;
        //##############GPU-TASK########################


        System.out.println("cpu"+timeElapsed);
        System.out.println("gpu"+timeElapsed2);

        kernel.dispose();
    }
}

My specs are:

Aparapi is running on an untested OpenCL platform version: OpenCL 3.0 CUDA 11.6.13
Intel Core i7 6850K @ 3.60GHz   Broadwell-E/EP 14nm Technology
2047MB NVIDIA GeForce GTX 1060 6GB (ASUStek Computer Inc)

The results that I get are this:

cpu12000
gpu5732829900

My question is why the performance of GPU is so slow. Why does CPU outperform GPU? I expect from GPU to be faster than the CPU does, my calculations are wrong, any way to improve it?

Upvotes: 2

Views: 183

Answers (1)

Egor
Egor

Reputation: 565

This code is measured the host side execution time for GPU task. It means that the measured time includes the time of the task execution on GPU, the time of copying the data for the task to GPU, the time of reading the data from GPU and the overhead that is introduced by Aparapi. And, according to the documentation for Kernel class, Aparapi uses lazy initialization:

On the first call to Kernel.execute(int _globalSize), Aparapi will determine the EXECUTION_MODE of the kernel. This decision is made dynamically based on two factors:

  • Whether OpenCL is available (appropriate drivers are installed and the OpenCL and Aparapi dynamic libraries are included on the system path).
  • Whether the bytecode of the run() method (and every method that can be called directly or indirectly from the run() method)
  • can be converted into OpenCL.

Therefore, the host side execution time for GPU task cannot be compared with the execution time for CPU task. Because it includes additional work that is performed only once.

In this case, it is necessary to use getProfileInfo() call to get the execution time breakdown for the kernel:

kernel.execute(Range.create(size));
List<ProfileInfo> profileInfo = kernel.getProfileInfo();
for (final ProfileInfo p : profileInfo) {
   System.out.println(p.getType() + " " + p.getLabel() + " " + (p.getEnd() - p.getStart()) + "ns");
}

Also, please note that the following property must be set: -Dcom.aparapi.enableProfiling=true. For more information please see Profiling the Kernel article and the implementation of ProfileInfo class.

Upvotes: 3

Related Questions