Reputation: 9090
I am doing some reverse engineering tasks towards binaries on 32-bit x86
architecture.
Recently I found some interesting optimizations from C
source code to assembly program.
For example, the original source code is like (this source code is from openssl library
):
powerbufFree = (unsigned char *)malloc(powerbufLen);
And after compilation (gcc version 4.8.4 -O3
), the assembly code is like this:
807eaa0: cmp eax, 0xbff # eax holds the length of the buf.
807eaa5: mov dword ptr [ebp-0x68], eax # store the length of powerbuf on the stack
807eaa8: jnle 0x807ec60 # 0x807ec60 refers to the malloc
807eaae: mov edx, eax
807eab0: add eax, 0x5e
807eab3: and eax, 0xfffffff0
807eab6: sub esp, eax
807eab8: lea eax, ptr [esp+0x23]
807eabc: and eax, 0xffffffc0
807eabf: add eax, 0x40
807ead3: mov dword ptr [ebp-0x60], eax # store the base addr of the buf on the stack.
To my surprise, the buf is indeed allocated on the stack!!! It seems like an optimization for heap allocator for me, but I am not sure.
So here is my question, does the above optimization (malloc --> stack allocation) seems familar to anyone? Does it make sense? Could anyone provide some manual/specification on such optimization?
Upvotes: 8
Views: 573
Reputation: 70362
From the source of bn_exp.c:
0634 #ifdef alloca
0635 if (powerbufLen < 3072)
0636 powerbufFree = alloca(powerbufLen+MOD_EXP_CTIME_MIN_CACHE_LINE_WIDTH);
0637 else
0638 #endif
0639 if ((powerbufFree=(unsigned char*)OPENSSL_malloc(powerbufLen+MOD_EXP_CTIME_MIN_CACHE_LINE_WIDTH)) == NULL)
0640 goto err;
Note that 0xbff
is equal to 3071. On systems that support it, alloca
does stack allocation. This is true of the GNU version, which is used by Linux, and BSD implementations copied this API from 32V UNIX from AT&T (according to FreeBSD).
You only looked at line 639. But if alloca
is defined, then the C code matches up to your assembly.
The optimization itself is often used to avoid the expense of using malloc
for a temporary buffer if the allocation is relatively small. For C.1999, a VLA could be used instead (since C.2011, VLA is an optional feature).
Sometimes, the optimization just uses a fixed size buffer of some reasonable smallish size. For example:
char tmp_buf[1024];
char *tmp = tmp_buf;
if (bytes_needed > 1024) {
tmp = malloc(bytes_needed);
}
/* ... */
if (tmp != tmp_buf) {
free(tmp);
}
Upvotes: 6