Cortex M7 floating arithmetic instruction duration with zero operand

Question

I'd like to know whether the duration of a floating point instruction like VMUL is significantly shorter when an operand is zero, on a Cortex M7 FPU.

The reason is that I'm profiling a software that is processing many variables coming from analog sources, and more precisely the evolution of these variables with time. But right now the "front end" (ie. the analog sources) is not available so I'm using simulated variables but since they are not evolving with time, many variables in the code are zero.

Peter Cordes · Accepted Answer

Pipelined CPUs usually have fixed latencies (not data-dependent) for everything except very slow operations like div. Otherwise you have to deal with write-back conflicts if you start a "fast" instruction a cycle or two after a "slow" instruction.

You could test it yourself by running the vmul in a latency-bound loop (e.g. multiply a register by itself 3 or 4 times in an unrolled loop). Try with "simple" values like 0.0, then with non-simple values like 1.0000000001 (which has many significant digits). Run enough loop iterations to hide measurement overhead, but few enough that you stop before overflow to +Inf.

Cortex M7 floating arithmetic instruction duration with zero operand

Answers (2)

Related Questions