user34537
user34537

Reputation:

Assembly: C++ stack variable addresses different/wrong?

I don't understand why getting the address of one variable is fine and the other gets me 0xD which then crashes due to writing a value at an invalid address (0xD in r.thefn(0);).

This is the output which shows the two variables not having a similar address. Here is what GDB showed and the assembly output. My x86 assembly isn't great (I have never written x86 assembly). I don't know if it shows enough information, but if I don't, can you tell me what else is required to debug this? Why is one variable 0xBF8BAF1C and the other is 0xD? C++ and assembly code is below but formatted better in the gist link above.

There is a static_assert enforcing String to be a POD which means no nontrival constructors. It is using the default constructor C++ generates. It's also on the stack which means if new is overloaded it wouldn't affect it. & isn't overloaded, but it also looks correct the first two times the function is called.

What might affect the r address? I can see variable varme's address is the same the second and third time it's called, but the third time r is magically different.

This compiles and runs properly using Visual C++ (2012 works), g++ 4.6.2, fails on Linux (Ubuntu) using g++ 3.7, 3.6.3 and Clang 3.0.

sanity check 1 0xbf8bb4cc
sanity check 2 0xbf8bb4cc 0xbf8bb538
sanity check 3 0xbf8bb4cc 0xbf8bb538
this 0xbf8bb538
sanity check 1 0xbf8baf1c
sanity check 2 0xbf8baf1c 0xbf8baf40
sanity check 3 0xbf8baf1c 0xbf8baf40
this 0xbf8baf40
sanity check 1 0xbf8baf1c
sanity check 2 0xbf8baf1c 0xd
sanity check 3 0xbf8baf1c 0xd
this 0xd

Here is the code: One additional note is there is a static_assert on String which enforces that it is a POD which means no non-default constructor. I checked operator & isn't overloaded.

static int aa=0;
aa++;
int varme;

printf("sanity check 1 %p\n", &varme);
String r;
printf("sanity check 2 %p %p\n", &varme, &r);
//auto v=anotherfn(sz);
printf("sanity check 3 %p %p\n", &varme, &r);
//printf("callingfn=%s,%d %p %p\n", sz,aa, v, &r);
r.thefn(0);
return r;

   ¦0x8084101 <callingfn(char const*)+1> mov %esp,%ebp ¦
   ¦0x8084103 <callingfn(char const*)+3> push %esi ¦
   ¦0x8084104 <callingfn(char const*)+4> sub $0x34,%esp ¦
   ¦0x8084107 <callingfn(char const*)+7> mov 0xc(%ebp),%eax ¦
   ¦0x808410a <callingfn(char const*)+10> mov 0x8(%ebp),%ecx ¦
   ¦0x808410d <callingfn(char const*)+13> mov %eax,-0x8(%ebp) ¦
   ¦0x8084110 <callingfn(char const*)+16> mov 0x81bc894,%eax ¦
   ¦0x8084115 <callingfn(char const*)+21> lea 0x1(%eax),%eax ¦
   ¦0x8084118 <callingfn(char const*)+24> mov %eax,0x81bc894 ¦
   ¦0x808411d <callingfn(char const*)+29> lea -0xc(%ebp),%eax ¦
   ¦0x8084120 <callingfn(char const*)+32> mov %esp,%edx ¦
   ¦0x8084122 <callingfn(char const*)+34> mov %eax,0x4(%edx) ¦
   ¦0x8084125 <callingfn(char const*)+37> movl $0x812ee78,(%edx) ¦
   ¦0x808412b <callingfn(char const*)+43> mov %ecx,-0x10(%ebp) ¦
   ¦0x808412e <callingfn(char const*)+46> mov %eax,-0x14(%ebp) ¦
   ¦0x8084131 <callingfn(char const*)+49> call 0x8049a90 <printf@plt> ¦
   ¦0x8084136 <callingfn(char const*)+54> mov %esp,%ecx ¦
   ¦0x8084138 <callingfn(char const*)+56> mov -0x10(%ebp),%edx ¦
   ¦0x808413b <callingfn(char const*)+59> mov %edx,0x8(%ecx) ¦
   ¦0x808413e <callingfn(char const*)+62> mov -0x14(%ebp),%esi ¦
   ¦0x8084141 <callingfn(char const*)+65> mov %esi,0x4(%ecx) ¦
   ¦0x8084144 <callingfn(char const*)+68> movl $0x812ee8b,(%ecx) ¦
   ¦0x808414a <callingfn(char const*)+74> mov %eax,-0x18(%ebp) ¦
   ¦0x808414d <callingfn(char const*)+77> call 0x8049a90 <printf@plt> ¦
   ¦0x8084152 <callingfn(char const*)+82> mov %esp,%ecx ¦
   ¦0x8084154 <callingfn(char const*)+84> mov -0x10(%ebp),%edx ¦
   ¦0x8084157 <callingfn(char const*)+87> mov %edx,0x8(%ecx) ¦
   ¦0x808415a <callingfn(char const*)+90> mov -0x14(%ebp),%esi ¦
   ¦0x808415d <callingfn(char const*)+93> mov %esi,0x4(%ecx) ¦
   ¦0x8084160 <callingfn(char const*)+96> movl $0x812eea1,(%ecx) ¦
   ¦0x8084166 <callingfn(char const*)+102> mov %eax,-0x1c(%ebp) ¦
   ¦0x8084169 <callingfn(char const*)+105> call 0x8049a90 <printf@plt> ¦
                                                                               ¦
   ¦0x8084169 <callingfn(char const*)+105> call 0x8049a90 <printf@plt> ¦
   ¦0x808416e <callingfn(char const*)+110> mov %esp,%ecx ¦
   ¦0x8084170 <callingfn(char const*)+112> mov -0x10(%ebp),%edx ¦
   ¦0x8084173 <callingfn(char const*)+115> mov %edx,(%ecx) ¦
   ¦0x8084175 <callingfn(char const*)+117> movl $0x0,0x4(%ecx) ¦
   ¦0x808417c <callingfn(char const*)+124> mov %eax,-0x20(%ebp) ¦
   ¦0x808417f <callingfn(char const*)+127> call 0x8056d00 <SomeClass<blah>::thefn(blah*)> ¦
  >¦0x8084184 <callingfn(char const*)+132> add $0x34,%esp ¦
   ¦0x8084187 <callingfn(char const*)+135> pop %esi ¦
   ¦0x8084188 <callingfn(char const*)+136> pop %ebp ¦
   ¦0x8084189 <callingfn(char const*)+137> ret $0x4 ¦
   ¦0x808418c nopl 0x0(%eax)

Upvotes: 0

Views: 694

Answers (1)

user1233508
user1233508

Reputation:

The setup:

String is defined as:

struct String {
    void *p;
    #ifdef __cplusplus
    /* Operators to help with comparing, etc. */
    /* No additional data members */
    void thefn(int arg); /* Return/argument type not relevant */
    #endif
};

and includes asserts to verify sizeof(String) == sizeof(void *) and the struct's POD-ness.


This part was not originally mentioned in the question: the function that calls this function returns the same String object to its caller, but it's called from external C code, where the caller expects a simple void * instead of String. The author's expectation was that this should work, because the return value's size and layout is the same.


The problem:

The C++ compiler used named return value optimization (NRVO) in this function. The function signature changed from

String fn(char const *);

to

void fn(char const *, String *);

This is visible in the disassembly, where ebp+0xC is read from before writing to it, and there's no effort spent on putting meaningful results into EAX. The ret 0x4 part was a little strange, since it implies only one argument is cleared off the stack, but apparently that is how GCC/Clang choose to implement this, by having the caller clear off the additional argument.

Presumably, the same optimization was applied in the caller function. But the C compiler saw no reason to apply this optimization (after all, it expected the result to be a void*, not a structure) and expected the return value to be passed like any pointer-sized result would be.

As a result:

  1. the C code passes only one argument into C++ code, which expects two, and the garbage at the top of the stack gets interpreted as the second argument.
  2. The C++ code doesn't produce a meaningful return value where C code expects to find one.

The solution:

The obvious first step towards a fix is to make sure the C code expects the same return value as the C++ code, a struct instead of a pointer.

However, I don't think there's a way to control whether NRVO is applied, so I suspect even with the correct return type it's still possible for the two sides of the code to apply this optimization inconsistently, given the small size of the structure. I also have no idea if extern "C" would have any effect on it.

(This answer summarizes what was said in the comments, with some guesswork to fill the gaps)

Upvotes: 6

Related Questions