Reputation: 298
With valgrind and perf/FlameGraphs I have identified part of my application which is consuming almost 100% of CPU:
for(size_t i = 0; i < objects.size(); i++) {
//this part consumes 11% CPU ----->
collions_count = database->get_collisions(collisions_block, objects[i].getKey());
feature1 = objects[i].feature1;
//<--------
for(int j = 0; j < collions_count * 2; j += 2) {
hash =
((collisions_block[j] & config::MASK_1) << config::SHIFT) |
((collisions_block[j+1] - feature1) & config::MASK_2);
if (++offsets[hash] >= config::THRESHOLD_1) {
//... this part consumes < 1% of CPU
}
}
}
The calculation of hash and following if statement take nearly 90% of CPU of all application.
collisions_block
is initialized once and is of type int[100000]
config::
is a namespace with variables containing global configurationoffsets
is initialized once and is of type uint8_t[1<<24]
usr
there is no iowait
in mpstat output-std=gnu++11 -Ofast -Wall
Is there any way to speed up the inner loop?
Upvotes: 4
Views: 259
Reputation: 298
I identified the performance bottleneck to be the unordered access to array ++offsets[hash]
. It was consuming most of the CPU time (75+%). I achieved 2.5x speed increase by reducing the size of the array from 1<<24
to 1<<21
and experimenting with appropriate MASKS
configuration.
I will describe briefly how I identified the problem
for(size_t i = 0; i < objects.size(); i++) {
//this part consumes 11% CPU ----->
collions_count = database->get_collisions(collisions_block, objects[i].getKey());
feature1 = objects[i].feature1;
//<--------
for(int j = 0; j < collions_count * 2; j += 2) {
hash = calculate_hash(collisions_block[j],
collisions_block[j+1],
feature1,
config::MASK_1,
config::MASK_2
config::SHIFT);
if (check_condition(hash, config::THRESHOLD_1)) {
//... this part consumes < 1% of CPU
}
}
}
__attribute__((noinline))
to prevent gcc from inlining new functions. They will not appear on call stack if inlined) -g -rdynamic
gcc flags perf record -p <pid> -F 200 -g --call-graph dwarf -- sleep 60
perf script | ./stackcollapse-perf.pl > out.perf-folded && ./flamegraph.pl out.perf-folded > graph.svg
Upvotes: 1