Understanding what this x86 assembly function does, recursion

Question

I have this Assembly function here, and have run through it multiple times trying to understand what it does and what the pattern is. I have hit a brick wall in terms of understanding it's pattern. Any form of guidance is appreciated here.

   0x0000555555555cbe <+0>: push   %rbx
   0x0000555555555cbf <+1>: mov    %edx,%eax
   0x0000555555555cc1 <+3>: sub    %esi,%eax
   0x0000555555555cc3 <+5>: mov    %eax,%ebx
   0x0000555555555cc5 <+7>: shr    $0x1f,%ebx
   0x0000555555555cc8 <+10>: add    %eax,%ebx
   0x0000555555555cca <+12>: sar    %ebx
   0x0000555555555ccc <+14>: add    %esi,%ebx
   0x0000555555555cce <+16>: cmp    %edi,%ebx
   0x0000555555555cd0 <+18>: jg     0x555555555cda 
   0x0000555555555cd2 <+20>: cmp    %edi,%ebx
   0x0000555555555cd4 <+22>: jl     0x555555555ce6 
   0x0000555555555cd6 <+24>: mov    %ebx,%eax
   0x0000555555555cd8 <+26>: pop    %rbx
   0x0000555555555cd9 <+27>: retq  
   0x0000555555555cda <+28>: lea    -0x1(%rbx),%edx
   0x0000555555555cdd <+31>: callq  0x555555555cbe 
   0x0000555555555ce2 <+36>: add    %eax,%ebx
   0x0000555555555ce4 <+38>: jmp    0x555555555cd6 
   0x0000555555555ce6 <+40>: lea    0x1(%rbx),%esi
   0x0000555555555ce9 <+43>: callq  0x555555555cbe 
   0x0000555555555cee <+48>: add    %eax,%ebx
   0x0000555555555cf0 <+50>: jmp    0x555555555cd6

If my Input is "6 1" the Registers are as follows:

rax            0x2 2
rbx            0x7fffffffe0c8 140737488347336
rcx            0x0 0
rdx            0xe 14
rsi            0x0 0
rdi            0x6 6
rbp            0x555555557170 0x555555557170 <__libc_csu_init>
rsp            0x7fffffffdfb8 0x7fffffffdfb8
r8             0x0 0
r9             0x0 0
r10            0x7ffff7b82f20 140737349431072
r11            0x55555555763a 93824992245306
r12            0x5555555558f0 93824992237808
r13            0x7fffffffe0c0 140737488347328
r14            0x0 0
r15            0x0 0
rip            0x555555555cbe 0x555555555cbe 
eflags         0x293 [ CF AF SF IF ]
cs             0x33 51
ss             0x2b 43
ds             0x0 0
es             0x0 0
fs             0x0 0
gs             0x0 0
k0             0x0 0
k1             0x0 0
k2             0x0 0
k3             0x0 0
k4             0x0 0
k5             0x0 0
k6             0x0 0
k7             0x0 0

I have tried following the code step by step, and with many inputs, and can't seem to understand the pattern that this function is following.

RDX Always seems to be 14, regardless of input. And from what I can understand there is a loop to see if a certain combination of shifting and subtracting is greater than or less than the original input.

These are the inputs i've tried, and their following outputs:

14, 1 = 45

13, 1 = 31

5, 1 = 15

6, 1 = 21

7, 1 = returns 7??

8, 1 = 35

Aplet123 · Accepted Answer

I always like to have a call graph for my assembly, so I assembled it in Binary Ninja:

(intel syntax)

(at&t syntax)

Personally, I prefer intel syntax, so I'll be using it from now on. We can see 3 registers that are read from before being written to, and those are edi, esi, and edx. Those also happen to be the first 3 argument registers in the standard x86 calling convention, so we can infer this function has 3 arguments. Here it is with comments (to understand why shifting by 0x1f yields the sign bit, you must first understand two's complement):

Here's the equivalent C code:

int func4(int a, int b, int c) {
    int num = (b + c + (c < b)) / 2;
    if (num > a) {
        return func4(a, b, num - 1) + num;
    } else if (num < a) {
        return func4(a, num + 1, c) + num;
    } else {
        return num;
    }
}

Now that we have clean, readable C code, we can start to reason about what the function actually does. (b + c) / 2 is an average, and c < b is always 0 or 1, which means the num variable must be very close to the average. In fact, if both b and c are even or both odd, then num is exactly the average, since the possible addition of 1 gets rounded down by integer division and doesn't change the value. However, if only one of b and c is even, then the possible addition actually matters. We'll look at two cases:

b is less than c, in which case c < b would evaluate to 0, so num would just be the average, rounded down due to integer division. Since b is less than c, rounding down is rounding towards b.
b is greater than c, in which case c < b would evaluate to 1, so num would be the average plus a half, which pushes it up to the next value. This means that num is the average rounded up, and since b is greater than c, it's rounded towards b.

So, b is essentially the average of b and c but rounded towards the value of b. Now we can look at the actual recursive cases. One thing to note is that a never changes. In fact, the function never terminates until the average of b and c reaches a. If the average is greater than a, then c is decreased to the average minus one to bring the average down, and if the average is less than a, then b is increased to the average plus one to bring the average up. Now, this is assuming that b < c. Why? This function really hates when b > c and all its logic breaks and it goes into infinite recursion, due to the fact that setting b to the average actually decreases b instead of increasing b, never allowing the average to reach a. So, in the end, you have some weird binary search where the average either starts above or below the expected value, then goes to the other side, then slowly approaches the desired value, and returns the sum of all averages along the way. This is also why func4(7, 1, 14) returned 7: 7 is the average of 1 and 14 so it returned immediately.

Understanding what this x86 assembly function does, recursion

Answers (1)

Related Questions