syko
syko

Reputation: 3637

Difficulties in understand assmbly code of '__atomic_compare_exchange'

I program in C++ and use CAS operation for thread synchronization.

I profiled my program by using Vtune and found that a huge portion of time was spent on CAS operation.

I took a look at the assembly code.

enter image description here

The profiling result shows that the significant portion of time is being spent on 'movq %rax, (%rsi)', but not on 'lock cmpxchgq %rcx, (%rdi)'.

How is 'movq %rax, (%rsi)' opreation related to CAS operation? Which data is being moved by this operation?

Upvotes: 0

Views: 79

Answers (2)

Vital
Vital

Reputation: 301

It is actually lock cmpxchg is taking this time. There is the following limitation mentioned in VTune release notes which explains this: Running time is attributed to the next instruction (200108041)

Upvotes: 1

1201ProgramAlarm
1201ProgramAlarm

Reputation: 32717

The lock cmpxchgq is taking a long time. When the profiler determines where the program currently is, it sometimes has to wait for an instruction to finish executing before it can find out. This causes the instruction following a long, non-interruptable instruction to be reported as taking up a large amount of time when it is really the previous instruction that is so lengthy.

Upvotes: 2

Related Questions