JHBonarius
JHBonarius

Reputation: 11271

C++ "No raw loops" without losing perfomance

So the 'new (old) big thing' is "No Raw Loops" in C++. I'm trying to write code that way, but it seems very inefficient. Yes, there are STL algoritms which can do about anything, but they don't seem very efficient.

I for instance have a situation where I want a pointer to a node in an array of nodes that has the highest score. Determining that score is a costly floating-point operation. So I implemented the STL algorithm version and compared it with the raw loop:

#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

int main()
{
    
    std::array<Node, 10> nodes;
    
    counter = 0;
    Node const* nodePtr = std::max_element(std::cbegin(nodes), std::cend(nodes),
        [](Node const& node1, Node const& node2) {
            return node1.Score() < node2.Score();
        });
    std::cout << "algorithm count " << counter << std::endl;
    
    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}

Evaluating this, for the STL version, the costly Score function is evaluated 18 times, while the raw loop only uses 10 evaluations...

Am I doing it wrong, or are raw loops just not that bad?

edit: After the suggestion by user58697 that cout and the static counter would prevent compiler optimization, I changed the code:

#include <cfloat>
#include <cmath>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>
#include <random>
#include <chrono>

template <typename T>
class Random {
private:
    std::default_random_engine generator;
    std::uniform_real_distribution<T> distribution;
public:
    Random()
        : generator()
        , distribution(0.0, 1.0)
    {}

    auto operator()() {
        return distribution(generator);
    };
};

static Random<double> myRandom;

class Timer {
private:
    std::chrono::high_resolution_clock::time_point startTime{};
public:
    void Start() noexcept {
        startTime = std::chrono::high_resolution_clock::now();
    }
    [[nodiscard]] auto ElapsedMs() const noexcept {
        return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime).count();
    }
};

static Timer timer;

class Node {
private:
    double val;
public:
    Node() noexcept : val(myRandom()) {}

    [[nodiscard]] auto Score() const noexcept {
        auto score = std::sqrt(std::log(10.0 / val));
        score = std::sin(score) / std::cos(score);
        score = std::sqrt(std::sqrt(std::sqrt(std::sqrt(std::sqrt(score)))));
        score = std::pow(score, 1000);
        return score;
    }
};

int main()
{
    std::array<Node, 100000> nodes; // yeah, yeah... overloading the stack, I know

    for (auto i = 0; i < 2; i++) {
        timer.Start();
        Node const* nodePtr = &*std::max_element(std::cbegin(nodes), std::cend(nodes),
            [](Node const& node1, Node const& node2) {
                return node1.Score() < node2.Score();
            });
        std::cout << "algorithm elapsed time " << timer.ElapsedMs() << std::endl;

        timer.Start();
        double maxScore = -FLT_MAX;
        for (const auto& node : nodes) {
            auto score = node.Score();
            if (score > maxScore) {
                maxScore = score;
                nodePtr = &node;
            }
        }
        std::cout << "raw loop count " << timer.ElapsedMs() << std::endl;
    }
}

I run the loop twice to eliminate startup behavior... results of second loop (compiled with g++ 9.1 -O3):

algorithm elapsed time 16
raw loop count 8 (<== I see I forgot to change "count" to "time" :P)

So that's not it.

edit: Seeing this question got an upvote, people are still looking at it. Since this question was asked, C++20 was released. C++20's ranges library has a special feature which could have helped here, called Projection.

I.e. in this case you could use std::ranges::max_element or even std::ranges::max (which was missing from the old std algorithms) like

Node const* node = &*std::ranges::max_element(nodes, {}, &Node::Score);
...
Node const& node = std::ranges::max(nodes, {}, &Node::Score);

However, projection is not the solution here, due to the implementation choice, which doesn't use caching. The Proj projection function is called again and again for every argument of the Comp comparator function.

(The internal function call looks something like

return std::invoke(__comp, std::invoke(__proj, __a), std::invoke(__proj, __b)) ? __b : __a;

)

Upvotes: 6

Views: 2481

Answers (4)

JHBonarius
JHBonarius

Reputation: 11271

I just wanted to post the algo I used based on Richard's answer:

template <typename FwdIt, typename Proj, typename Comp = std::less<>>
constexpr FwdIt max_projected_element(FwdIt first, FwdIt last, Proj proj, Comp comp = {}) {
    auto found = first;
    if (first != last) {
        auto best = std::invoke(proj, *found);
        while (++first != last) {
            if (auto const value = std::invoke(proj, *first);
                std::invoke(comp, best, value)) {
                found = first;
                best = value;
            }
        }
    }
    return found;
}

which allows me to call the algo as

Node const* nodePtr = max_projected_element(
    std::cbegin(nodes), std::cend(nodes), &Node::Score);

Upvotes: 2

user58697
user58697

Reputation: 7923

It seems that the problem is in how the test is written. The statements

    std::cout << "complex calculation\n";
    count++;

make a visible side effect, and the compiler is obliged to call Score every time it appears. If it was pure (that is, having no side effects), the compiler would optimize the redundant calls away by cacheing.

Try to use a real Score (I hope the real one is pure), and compare the execution times.

Upvotes: 0

Richard
Richard

Reputation: 61279

Replacing raw loops with abstracted algorithms is good style because then you can reuse the algorithm many times but test it only once. Wrapping the loop this way may seem like syntactic sugar, but it greatly reduces the potential for bugs in your code because you can now do extensive unit tests on the abstracted algorithm and you never need to worry about mistakenly implementing it incorrectly when you need it.

However, you're comparing apples and oranges here. Your max_element implementation always calculates Score() for its comparison whereas your for loop caches the result of the Score() function.

A better implementation of Node might be:

class Node {
mutable:
    double cached_score = std::numeric_limits<double>::quiet_Nan();
public:
    auto Score() const -> double {
        if(std::isnan(cached_score)){
           std::cout << "complex calculation\n";
           counter++;
           cached_score = 1;
        }
        return cached_score;
    }
    void invalidate_cache() {
      cached_score = std::numeric_limits<double>::quiet_Nan();
    }
};

This way the complex calculation is only performed once.

Alternatively, write your own abstraction:

#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

template<class ForwardIt, class Evaluate, class Compare>
ForwardIt max_eval_element(
    ForwardIt first,
    ForwardIt last,
    Evaluate eval,
    Compare comp
){
    if (first == last) return last;

    ForwardIt largest = first;
    auto largest_val = eval(*first);
    ++first;
    for (; first != last; ++first) {
        const auto this_val = eval(*first);
        if (comp(largest_val, this_val)) {
            largest = first;
            largest_val = this_val;
        }
    }
    return largest;
}

int main()
{

    std::array<Node, 10> nodes;

    counter = 0;
    Node const* nodePtr = max_eval_element(std::cbegin(nodes), std::cend(nodes),
                                       [](Node const& node){ return node.Score(); },
                                       [](double const &a, double const &b) {
                                           return a<b;
                                       });
    std::cout << "algorithm count " << counter << std::endl;

    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}

In this instance, both loops perform the same number of evaluations.

Many internal codebases I've worked with have extensive libraries which extend the STL. It gives the teams I've worked on much greater confidence that their code has been written correctly and allows you to interpret complex operations at a glance. In this way, these abstractions also reduce the effort of understanding code and the effort of communication.

Upvotes: 6

Fred Larson
Fred Larson

Reputation: 62063

I think your case is somewhat pathological to std::max_element, since your comparator calls Score() on both elements every time, and you declare it to be expensive. So where you do:

for (const auto& node : nodes) {
    auto score = node.Score();
    if (score > maxScore) {
        maxScore = score;
        nodePtr = &node;
    }
}

... std::max_element effectively has to do:

for (const auto& node : nodes) {
    if (node.Score() > nodePtr->Score()) {
        nodePtr = &node;
    }
}

I think std::max_element would be just fine in a more normal case, where the value to compare by is cheap to access.

Upvotes: 1

Related Questions