How to measure the gflops of a matrix multiplication kernel?

Question

In the book Programming Massively Parallel Processors the number of gflops is used to compare the efficiency of different matrix multiplication kernels. How would I compute this for my own kernels on my own machine?

Somewhere in the NVIDIA Forums I found this 'algorithm', but I don't know, how valid it is or where the times two comes from.

NumOps = 2 * pow(MatrixSize,3)
gflops = 1.0e-9 * NumOps / ExecutionTime

p.s. please feel free to change the tags...

Heatsink · Accepted Answer

You can measure the GFLOPs by running the algorithm with a large input and measuring the execution time. Then put the execution time and matrix size into that formula. For matrix sizes big enough to keep the entire machine busy, the FLOPs is only weakly dependent on matrix size.

The GPU matrix multiplication algorithm performs the same number of floating-point operations as the naive algorithm.

for (i = 0; i < MatrixSize; i++)
  for (j = 0; j < MatrixSize; j++)
    for (k = 0; k < MatrixSize; k++)
      C[j][i] += A[j][k] * B[k][i];

There are 2 floating-point operations in the loop body, and MatrixSize * MatrixSize * MatrixSize iterations of the loop body, which gives you the formula for NumOps. GFLOPs is just the number of operations per second, divided by 10^9 ('giga').

How to measure the gflops of a matrix multiplication kernel?

Answers (1)

Related Questions