Xirema
Xirema

Reputation: 20396

Is there a way to optimize this function?

For an application I'm working on, I need to take two integers and add them together using a particular mathematical formula. This ends up looking like this:

int16_t add_special(int16_t a, int16_t b) {
    float limit = std::numeric_limits<int16_t>::max();//32767 as a floating point value
    float a_fl = a, b_fl = b;
    float numerator = a_fl + b_fl;
    float denominator = 1 + a_fl * b_fl / std::pow(limit, 2);
    float final_value = numerator / denominator;
    return static_cast<int16_t>(std::round(final_value));
}

Any readers with a passing familiarity with physics will recognize that this formula is the same as what is used to calculate the sum of near-speed-of-light velocities, and the calculation here intentionally mirrors that computation.

The code as-written gives the results I need: for low numbers, they nearly add together normally, but for high numbers, they converge to the maximum value of 32767, i.e.

Which all appears to be correct.

The problem, however, is that the function as-written involves first transforming the numbers into floating point values before transforming them back into integers. This seems like a needless detour for numbers that I know, as a principle of its domain, will never not be integers.

Is there a faster, more optimized way to perform this computation? Or is this the most optimized version of this function I can create?

I'm building for x86-64, using MSVC 14.X, although methods that also work for GCC would be beneficial. Also, I'm not interested in SSE/SIMD optimizations at this stage; I'm mostly just looking at the elementary operations being performed on the data.

Upvotes: 1

Views: 153

Answers (3)

Johannes Overmann
Johannes Overmann

Reputation: 5151

Suggestions:

  • Use 32767.0*32767.0 (which is a constant) instead of std::pow(limit, 2).
  • Use integer values as much as possible, potentially with fixed points. Just the two divisions are a problem. Use floats just form them, if necessary (depends on the input data ranges).
  • Make it inline if the function is small and if it is appropriate.

Something like:

int16_t add_special(int16_t a, int16_t b) {
    float numerator = int32_t(a) + int32_t(b); // Cannot overflow.
    float denominator = 1 + (int32_t(a) * int32_t(b)) / (32767.0 * 32767.0); //  Cannot overflow either.
    return (numerator / denominator) + 0.5; // Relying on implementation defined rounding. Not good but potentially faster than std::round().
}

The only risk with the above is the omission of the explicit rounding, so you will get some implicit rounding.

Upvotes: 1

Jarod42
Jarod42

Reputation: 217468

You might avoid floating number and does all computation in integral type:

constexpr int16_t add_special(int16_t a, int16_t b) {
    std::int64_t limit = std::numeric_limits<int16_t>::max();
    std::int64_t a_fl = a;
    std::int64_t b_fl = b;
    return static_cast<int16_t>(((limit * limit) * (a_fl + b_fl)
                                 + ((limit * limit + a_fl * b_fl) / 2)) /* Handle round */
                                / (limit * limit + a_fl * b_fl));
}

Demo

but according to Benchmark, it is not faster for those values.

Upvotes: 2

Bob__
Bob__

Reputation: 12769

As noted by Johannes Overmann, a big performance boost is gained by avoiding std::round, at the cost of some (little) discrepancies in the results, though.

I tried some other little changes HERE, where it seems that the following is a faster approach (at least for that architecture)

constexpr int32_t i_max = std::numeric_limits<int16_t>::max();
constexpr int64_t i_max_2 = static_cast<int64_t>(i_max) * i_max;

int16_t my_add_special(int16_t a, int16_t b)
{
    // integer multipication instead of floating point division
    double numerator = (a + b) * i_max_2; 
    double denominator = i_max_2 + a * b;
    // Approximated rounding instead of std::round
    return 0.5 + numerator / denominator;
}

Upvotes: 1

Related Questions