MK.
MK.

Reputation: 4027

Understanding double to int64_t conversion

So I have two functions, one just casts from double to int64_t, the other calls std::round:

std::int64_t my_cast(double d)
{
  auto t = static_cast<std::int64_t>(d);
  return t;
}

std::int64_t my_round(double d)
{
  auto t = std::round(d);
  return t;
}

They work correctly: cast(3.64) = 3 and round(3.64) = 4. But, when I look at the assembly, they seem to be doing the same thing. So am wondering how they get different results?

$ g++ -std=c++1y -c -O3 ./round.cpp -o ./round.o 
$ objdump -dS ./round.o
./round.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_Z7my_castd>:
   0:   f2 48 0f 2c c0          cvttsd2si %xmm0,%rax
   5:   c3                      retq
   6:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
   d:   00 00 00

0000000000000010 <_Z8my_roundd>:
  10:   48 83 ec 08             sub    $0x8,%rsp
  14:   e8 00 00 00 00          callq  19 <_Z7my_castd+0x19> <========!!!
  19:   48 83 c4 08             add    $0x8,%rsp
  1d:   f2 48 0f 2c c0          cvttsd2si %xmm0,%rax
  22:   c3                      retq

Disassembly of section .text.startup:

0000000000000030 <_GLOBAL__sub_I__Z7my_castd>:
  30:   48 83 ec 08             sub    $0x8,%rsp
  34:   bf 00 00 00 00          mov    $0x0,%edi
  39:   e8 00 00 00 00          callq  3e <_GLOBAL__sub_I__Z7my_castd+0xe>
  3e:   ba 00 00 00 00          mov    $0x0,%edx
  43:   be 00 00 00 00          mov    $0x0,%esi
  48:   bf 00 00 00 00          mov    $0x0,%edi
  4d:   48 83 c4 08             add    $0x8,%rsp
  51:   e9 00 00 00 00          jmpq   56 <_Z8my_roundd+0x46>

I am not sure what the purpose of that callq on line 14 is for, but, even with that, my_cast and my_round seem to be just doing a cvttsd2si which, I believe is conversion with truncation.

However, the two functions, like I mentioned earlier, produce different (correct) values on the same input (say 3.64)

What is happening?

Upvotes: 24

Views: 12432

Answers (3)

ach
ach

Reputation: 2373

When dumping an object file with objdump -d, it is quite important to add the option -r, which commands the utility to also dump relocations:

$ objdump -dr round.o
...
0000000000000010 <_Z8my_roundd>:
  10:   48 83 ec 28             sub    $0x28,%rsp
  14:   e8 00 00 00 00          callq  19 <_Z8my_roundd+0x9>
                        15: R_X86_64_PC32       _ZSt5roundd
  19:   48 83 c4 28             add    $0x28,%rsp
  1d:   f2 48 0f 2c c0          cvttsd2si %xmm0,%rax

Now, notice the new line that appeared. That's a relocation instruction embodied into the object file. It instructs the linker to add a distance between _Z8my_roundd+0x9 and _ZSt5roundd to the value found at offset 15.

The e8 at offset 14 is the operation code for relative call. The following 4 bytes must contain the IP-relative offset to the function being called (the IP at the moment of execution pointing to the next instruction). Because the compiler cannot know that distance, it leaves it filled with zeroes and inserts a relocation so that linker can fill it later.

When disassembling without the -r option, relocations are ignored, and that creates the illusion that the function _Z8my_roundd makes a call into the middle of itself.

Upvotes: 13

Anton Savin
Anton Savin

Reputation: 41301

Assembly output is more useful (g++ ... -S && cat round.s):

...
_Z7my_castd:
.LFB225:
    .cfi_startproc
    cvttsd2siq  %xmm0, %rax
    ret
    .cfi_endproc
...
_Z8my_roundd:
.LFB226:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    call    round             <<< This is what callq 19 means
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    cvttsd2siq  %xmm0, %rax
    ret
    .cfi_endproc

As you can see, my_round calls std::round and then executes cvttsd2siq instruction. This is because std::round(double) returns double, so its result still has to be converted to int64_t. And that is what cvttsd2siq is doing in both your functions.

Upvotes: 19

manlio
manlio

Reputation: 18902

With g++ you can have a higher level view of what's happening using the -fdump-tree-optimized switch:

$ g++ -std=c++1y -c -O3 -fdump-tree-optimized ./round.cpp

That produces a round.cpp.165t.optimized file:

;; Function int64_t my_cast(double) (_Z7my_castd, funcdef_no=224, decl_uid=4743$

int64_t my_cast(double) (double d)
{
  long int t;

  <bb 2>:
  t_2 = (long int) d_1(D);
  return t_2;
}


;; Function int64_t my_round(double) (_Z8my_roundd, funcdef_no=225, decl_uid=47$

int64_t my_round(double) (double d)
{
  double t;
  int64_t _3;

  <bb 2>:
  t_2 = round (d_1(D));
  _3 = (int64_t) t_2;
  return _3;
}

Here the differences are quite clear (and the call to the round function glaring).

Upvotes: 18

Related Questions