schwooples
schwooples

Reputation: 23

I need help figuring out what this C code does

I've got this snippet of C code from Ghidra, and I can't quite figure out what it's doing. I suspect some kind of root, maybe?

The two args passed in are a sum of squares (sometimes 2 sometimes 3 terms), and an extra value such as 0x18, 0x10 or, 0 (sometimes this arg is absent!)

uint FUN_80059070(uint param_1,uint param_2)

{
  uint uVar1;
  uint uVar2;
  uint uVar3;
  uint uVar4;
  uint uVar5;
  uint uVar6;

  uVar5 = 0;
  uVar4 = 1;
  uVar6 = 0;
  uVar3 = 1 << (param_2 & 0x1f);
  while ((uVar3 < param_1 && (uVar3 << 2 != 0))) {
    uVar4 = uVar4 + 1;
    uVar3 = uVar3 << 2;
  }
  uVar1 = 1 << (uVar4 + (param_2 - 1) & 0x1f);
  while (uVar3 != 0) {
    uVar2 = uVar5 << (uVar4 & 0x1f);
    if ((int)uVar4 < 0) {
      uVar2 = uVar5 >> (-uVar4 & 0x1f);
    }
    uVar2 = uVar2 + uVar6 + uVar3;
    if (uVar2 <= param_1) {
      uVar5 = uVar5 + uVar1;
      uVar6 = uVar2;
    }
    uVar1 = uVar1 >> 1;
    uVar3 = uVar3 >> 2;
    uVar4 = uVar4 - 1;
  }
  return uVar5;
}

Upvotes: 0

Views: 2278

Answers (1)

klutt
klutt

Reputation: 31409

A decently good way to understand code is to refactor it.

First, create a test function and a couple of test cases. Then you rewrite the function. It can look like this. It's very simplistic and for a larger refactoring, I would make this a little bit more sophisticated.

bool test(uint param_1, uint param_2) 
{
    return (FUN_80059070(param_1, param_2) == my_func(param_1, param2));
}

int main()
{
    uint test_cases[3][2] = { {0,0}, {8, 12}, {12, 14}};

    for(int i=0; i<3; i++) {
        if(! test(test_cases[i][0], test_cases[i][1])) {
             printf("Case %d with values %d and %d failed\n", 
                     i, test_cases[i][0], test_cases[i][1]);
             exit(EXIT_FAILURE);
        }
    }

    printf("All tests passed\n");
}

Since you have known conditions on the parameters, consider writing a snippet that creates test cases for you. Create a lot of test cases, but be aware of the risk of overflow.

After that, you can start the process of refactoring. Start by copying the whole body of FUN_80059070 to my_func and then we will replace lines and code blocks.

For instance, start by investigating what 1 << (param_2 & 0x1f); actually does by googling and testing for different values. When you understand what it does, you create a function.

uint describing_name(uint x) { return (x & 0x1f); }

and change the line initializing uVar3 to

uVar3 = 1 << describing_name(param_2);

Then take small steps. For instance, uVar3 << 2 is equivalent to uVar * 4, but the latter is easier to read. In the more general case, x << y is the same as x * pow(2,y). Be aware that pow has the signature double pow(double, double) so either cast or write your own integer variant.

Then work your way through the code iteratively and run the tests every time. If some part of the code is particularly tricky, you can or course create a separate test with appropriate test cases for that function.

Please note that replacing << with pow does not necessarily always makes sense. Sometimes they are used for bit manipulation, and sometimes just as faster multiplication. For a compiler with a bad or without optimizer it can make a huge performance difference. In those cases it makes sense to replace them with pow but in other cases << can be used for removing the most significant bits.

For instance, I don't know how many bits uint is on your system, but x & 0x1f will return the number you get if you set all bits except the 15 least significant. It could be the case that big or small endian matters here. I don't think so, but I'm not sure. If I'm correct, then x & 0x1f is the same as x % 32 which is the modulo operation. This is also a common optimization. Bitshifts is much faster than multiplications and modulo. So we can rename the function describing_name to modulo32.

The if((int)uVar4 < 0) is basically a "clever" way of checking if the most significant bit is set, or if uVar4 contains a larger number than signed int can represent. Both interpretations are equivalent.

And now it looks like this:

uint modulo32(uint x) { return (x & 0x1f); }

bool larger_than_INT_MAX(uint x) { return (int)x<0; }

uint my_func(uint param_1, uint param_2)
{
         uint uVar1, uVar2, uVar3, uVar4, uVar5, uVar6;

         uVar5 = 0;
         uVar4 = 1;
         uVar6 = 0;
         uVar3 = powi(2, modulo32(param_2));
         while ((uVar3 < param_1 && (uVar3 * 4 != 0))) {
                 uVar4 = uVar4 + 1;
                 uVar3 = uVar3 * 4;
         }

         uVar1 = powi(2, uVar4 + (modulo32(param_2-1)));
         while (uVar3 != 0) {
                 uVar2 = uVar5 * powi(2, modulo32(uVar4));
                 if (larger_than_INT_MAX(uVar4)) {
                         uVar2 = uVar5 / powi(2, -uVar4);
                 }

                 uVar2 = uVar2 + uVar6 + uVar3;
                 if (uVar2 <= param_1) {
                         uVar5 = uVar5 + uVar1;
                         uVar6 = uVar2;
                 }
                 uVar1 = uVar1 / 2;
                 uVar3 = uVar3 / 4;
                 uVar4 = uVar4 - 1;
         }
         return uVar5;
}

powi is a simple integer power function I wrote. The above code is still not very easy to understand, but at least the weird bit operations and "clever" code parts are out of the way.

Now I note something. uVar3 * 4 != 0 does not really make sense as a mathematical operation, because this would only be true for uVar3 == 0. But what this does is to check if all bits EXCEPT the two most sigificant are zero or not. So you could replace it with this function:

bool fourteen_least_significant_bits_are_not_zero(uint x) {
    return x << 2 != 0;
}

Replace function names or use comments when you get more understanding of what the code actually means. Also replace the pretty anonymous names uVar1, uVar2 etc when you know what they do.

After this, I'd suggest trying to rename this function:

void describing_name(uint *uVar3p, uint *uVar4p, uint param_1)
         // These are declared so that you can just copy paste the code
         uint uVar3 = *uVar3p;
         uint uVar4 = *uVar4p;

         // Copy paste with no modifications
         while ((uVar3 < param_1 && 
                 fourteen_least_significant_bits_are_not_zero(uVar3)) {
                 uVar4 = uVar4 + 1;
                 uVar3 = uVar3 * 4;
         }

         // And write back the values
         *uVar3p = uVar3;
         *uVar4p = uVar4;
}

Replace the while loop with:

describing_name(&uVar3, &uVar4, param_1);

Refactoring the code is often the best way to understand it. And remember, testing is vital when refactoring.

Upvotes: 2

Related Questions