Why std::find_if significantly accelerates function only on Intel CPUs?

Question

Rewriting the following function using std::find_if yields roughly 2x speed up, but only when given Intel CPU.

bool process_update_range_based(int treeId, int newHeight, bool tall,
                                  std::vector& vec, std::vector& updateSet) {
    for (auto &hr : vec) {
        if (hr.treeId == treeId) {
            if (hr.height == newHeight)
                return false;
            else {
                hr.height = newHeight;
                updateSet.push_back(treeId);
                return true;
            }
        }
    }
    vec.emplace_back(treeId, newHeight, tall);
    return true;
}

The following measurements were taken: simulate 10 times, measure average and minimum time to execute N iterations. Minimal value is the most reliable, especially in browser when there is no data on external CPU clocks (Godbolt values do vary a lot). Compiler flags were -O3 -march=x86-64-v3

CPU	for loop avg	find_if avg	for loop min	find_if min
Godbolt GenuineIntel	N/A	N/A	0.0275	0.0136
Godbolt AuthenticAMD	N/A	N/A	0.0245	0.0212
Apple M2	0.0258	0.0235	0.0216	0.0216
AMD 5600H	0.0288	0.0287	0.0244	0.0217
Intel i5-1335U	0.0294	0.0152	0.0223	0.0125

Godbolt source code & benchmark

As you can see, on Intel CPUs std::find_if almost 2x faster compared to basic loop. Meanwhile AMD/Apple CPUs either 10-20% or completely the same. This behaviour is consistent across multiple compilers (g++, clang++, MSVC)

It's especially confusing since acceleration occures only when replacing range-based loop with library function. .asm code looks like some simple moves and comparisons. Godbolt with functions assembly only (no benchmark)

What causes this difference? What's so special about Intel CPUs? Why they execute the same assembly 2x faster than AMD/Apple CPUs?

UPD. The std::array version show similar results overall. However with std::array it seems that results differ more depending on order of execution:

Godbolt std::array source code & benchmark

Godbolt std::array source code

Why std::find_if significantly accelerates function only on Intel CPUs?

Answers (0)

Related Questions