intrigued_66
intrigued_66

Reputation: 17248

InterlockedDecrement uses XADD but InterlockedIncrement uses INC?

I am debugging some code using a boost C++ library, which uses Windows InterlockedDecrement and InterlockedIncrement.

In the outputted assembly InterlockedIncrement uses LOCK INC whereas the InterlockedDecrement uses LOCK XADD.

Why do they not both use LOCK XADD?

(This is on Win 7 64, 64-bit compiling and MSVC 11)

Upvotes: 2

Views: 595

Answers (2)

Dietrich Epp
Dietrich Epp

Reputation: 213378

The INC instruction has a shorter encoding. You could implement both with LOCK XADD, but the code would take more space in memory. They are probably identical once they get turned into uops.

Now, why not use LOCK DEC?

The problem is that both functions (InterlockedDecrement and InterlockedIncrement) are specified to return the new, incremented value

long InterlockedDecrement(volatile long *addend);
long InterlockedIncrement(volatile long *addend);

So if you set out to implement these functions, you will have to use something like LOCK XADD. Your default implementation for InterlockedDecrement will have to look something like this:

mov         eax, -1
lock xadd   DWORD PTR [rcx], eax
dec         eax
ret         0

During optimization passes, the compiler can then recognize that the return value of these functions is not being used, and replace them with LOCK INC or LOCK DEC.

It is common to see this pattern in your code for reference counting:

InterlockedIncrement(&refcount);
...

if (InterlockedDecrement(&refcount) == 0)
    ...

So, the compiler sees that the return value of InterlockedIncrement is discarded, and so it uses LOCK INC.

The compiler can also recognize that the return value of InterlockedDecrement is only used in a conditional, and it can substitute LOCK DEC—but this is a more complicated optimization. There are more opportunities for this optimization to not happen—so for various reasons, you may see LOCK INC in disassembly paired with LOCK XADD, depending on whether the optimization happened or not.

We have a limited amount of insight into the original code and the logic that the compiler uses to select LOCK INC / LOCK DEC versus LOCK XADD, but I think it is enough to understand that LOCK DEC is an optimization, and there are two main reasons why the optimization may not happen:

  • The optimization may not be correct.
  • The optimization may be correct, but for various reasons, the compiler was unable to determine that the optimization was correct.

Upvotes: 5

Anonymous Coward
Anonymous Coward

Reputation: 11

Contrary to what Dietrich Epp implies, LOCK DEC does modify the zero flag. So if you're just using InterlockedDecrement for reference counts and such where the only thing that matters is if it's zero or not, you can implement InterlockedDecrement with LOCK DEC and that in fact was how it was implemented on Windows 95 and on Windows NT 3.51 and earlier. These operating systems still supported the 80386, which didn't have the XADD instruction, so they had to use INC and DEC. The Windows NT line dropped support for the 80386 and so Windows NT 4 and later all used XADD for both operations. I know it was like that on Windows XP and I've just verified it on Windows 10, so Intrigued 66's claim that InterlockedIncrement still uses INC is certainly wrong. I'm guessing he disassembled some library code that just used the instructions directly instead of the API call. Whatever piece of code that was, it didn't need the result so it used LOCK INC. And in the most common use case, you can use LOCK DEC instead of InterlockedDecrement and the resulting code will be faster than the API call with the overhead that entails.

Upvotes: 1

Related Questions