Reputation: 3637
I program in C++ and use CAS operation for thread synchronization.
I profiled my program by using Vtune and found that a huge portion of time was spent on CAS operation.
I took a look at the assembly code.
The profiling result shows that the significant portion of time is being spent on 'movq %rax, (%rsi)', but not on 'lock cmpxchgq %rcx, (%rdi)'.
How is 'movq %rax, (%rsi)' opreation related to CAS operation? Which data is being moved by this operation?
Upvotes: 0
Views: 79
Reputation: 301
It is actually lock cmpxchg is taking this time. There is the following limitation mentioned in VTune release notes which explains this: Running time is attributed to the next instruction (200108041)
Upvotes: 1
Reputation: 32717
The lock cmpxchgq
is taking a long time. When the profiler determines where the program currently is, it sometimes has to wait for an instruction to finish executing before it can find out. This causes the instruction following a long, non-interruptable instruction to be reported as taking up a large amount of time when it is really the previous instruction that is so lengthy.
Upvotes: 2