Francis Saa-Dittoh
Francis Saa-Dittoh

Reputation: 57

A Method of counting Floating Point Operations in a C++/CUDA Program using PTX

I have a somewhat large CUDA application and I need to calculate the attained GFLOPs. I'm looking for an easy and perhaps generic way of counting the number of floating point operations.

Is it possible to count floating point operations from the generated PTX code (as shown below), using a list of predefined fpo in assembly language? Based on the code, can the counting be made generic? For example, does add.s32 %r58, %r8, -2; count as one floating point operation?

EXAMPLE:

BB3_2:
.loc 2 108 1
mov.u32         %r8, %r79;
setp.ge.s32     %p1, %r78, %r16;
setp.lt.s32     %p2, %r78, 0;
or.pred         %p3, %p2, %p1;
@%p3 bra        BB3_5;

add.s32         %r58, %r8, -2;
setp.lt.s32     %p4, %r58, 0;
setp.ge.s32     %p5, %r58, %r15;
or.pred         %p6, %p4, %p5;
@%p6 bra        BB3_5;

.loc 2 112 1
ld.global.u8    %rc1, [%rd17];
cvt.rn.f32.u8   %f11, %rc1;
mul.wide.u32    %rd12, %r80, 4;
add.s64         %rd13, %rd7, %rd12;
ld.local.f32    %f12, [%rd13];
fma.rn.f32      %f14, %f11, %f12, %f14;
.loc 2 113 1
add.f32         %f15, %f15, %f12;

Or are there far simpler ways of counting FPOs and this is a waste of time?

Upvotes: 0

Views: 1397

Answers (1)

Roger Dahl
Roger Dahl

Reputation: 15734

The easiest way to count FLOPS would be to have the CUDA profiler do it for you. By selecting the Achieved FLOPS experiment, you can get charts like this:

FLOPS experiment

The Floating Point Operations chart displays a count of each type of floating point operation executed by your kernel.

Upvotes: 3

Related Questions