Raptor2277
Raptor2277

Reputation: 33

ASM algorithm decoding

I am trying to understand this problem that is in ASM. Here is the code:

45 33 C9                    xor r9d, r9d 
C7 44 24 18 50 72 69 6D     mov [rsp+arg_10], 6D697250h 
66 C7 44 24 1C 65 53        mov [rsp+arg_14], 5365h 
C6 44 24 1E 6F              mov [rsp+arg_16], 6Fh 
4C 63 C1                    movsxd r8, ecx 
85 C9                       test ecx, ecx 
7E 1C                       jle short locret_140001342 
41 8B C9                    mov ecx, r9d 
            loc_140001329: 
48 83 F9 07                 cmp rcx, 7 
49 0F 4D C9                 cmovge rcx, r9 
48 FF C1                    inc rcx 
8A 44 0C 17                 mov al, [rsp+rcx+arg_F] 
30 02                       xor [rdx], al 
48 FF C2                    inc rdx 
49 FF C8                    dec r8 
75 E7                       jnz short loc_140001329 
            locret_140001342: 
C3                          retn  

And here is the encoded text:

07 1D 1E 41 45 2A 00 25 52 0D 04 01 73 06 
24 53 49 39 0D 36 4F 35 1F 08 04 09 73 0E 
34 16 1B 08 16 20 4F 39 01 49 4A 54 3D 1B 
35 00 07 5C 53 0C 08 1E 38 11 2A 30 13 1F 
22 1B 04 08 16 3C 41 33 1D 04 4A  

I've been studying ASM for some time now and I know what most of the commands but I still have some questions I have not found the answer to.

How do i plug the encoded text into the algorithm?
What are arg_10, arg_14, etc? I assume they are from the encoded part but I dont know exatcly.

Could someone go line by line what this algorithm does, I understand some of it but I need some clarification.

I have been using visual studio and c++ to test asm. I do know that to run an asm procedure you can declare a function like this

extern "C" int function(int a, int b, int c,int d, int f, int g);  

and use it like this

printf("ASM Returned %d", function(92,2,3,4,5,6));  

I am also aware that the first four parameters go into int RCX, RDX, R8, and R9 and the rest are on the stack. I don't know much about the stack so I do not know how to access them right now. I also know that the returned value is the value contained by RAX. So a something like this would add two numbers:

xor eax, eax
mov eax, ecx
add eax, edx
ret  

So as Jester suggested, I will go line by line explaining what I think the code does.

xor r9d, r9d                  //xor on r9d (clears the register)
mov [rsp+arg_10], 6D697250h   //moves 6D697250 to the address pointed at by rsp + arg_10
mov [rsp+arg_14], 5365h       //moves 5365 to the adress pointed at by rsp+arg_14
mov [rsp+arg_16], 6Fh         //moves 6F to the adress pointed at by rsp+arg_16
movsxd r8, ecx                //moves ecx, to r8 and sign extends it since exc is 32 bit and r8 is 64 bit
test ecx, ecx                 //tests exc and sets the labels
jle short locret_140001342    //jumps to ret if ecx is zero or less
mov ecx, r9d                  //moves the lower 32 bits or r9 into ecx

loc_140001329:                //label used by jump commands
cmp rcx, 7                    //moves 7(decimal) into rcx
cmovge rcx, r9                //don't know
inc rcx                       //increases rcx by 1
mov al, [rsp + rcx + arg_F]   //moves the the value at adress [rsp + rcx + arg_F] into al,  
                              //this is probably the key step as al is 1 byte and each character is also one byte, it is also the rax register so it holds the value to be returned
xor [rdx], al                 //xor on the value at address [rdx] and al, stores the result at the address of [rdx]
inc rdx                      //increase rdx by 1
dec r8                       //decrease r8 by 1
jnz short loc_140001329      //if r8 is not zero jump back to loc_140...
                             //this essentially is a while loop until r8 reaches 0 (assuming it starts as positive)
locret_140001342:
ret  

I still don't know what the arg_xx are or how exactly is the encoded text plugged into this algorithm.

Upvotes: 2

Views: 1565

Answers (4)

Raptor2277
Raptor2277

Reputation: 33

Ok i have figured out the algorithm and have made it work in ASM as well. You guys were right, the arg_xx were offsets. arg_10 == 0x10, arg_f == 0x0f. The data is passed in as an array with the length of it. So rcx will be the data length in this case 47, and rdx will point to the beginning of the array. Here is the function I used in c++ to call the ASM procedure.

extern "C" void function(int length, char* message);  

The algorithm is pretty simple. The key phrase is "PrimeSo". All it does is do a XOR operation on each value passed in with one of the values in "PrimeSo" in increasing order, once it reaches the 'o' in "PrimeSo" it goes back to 'P'. Hence

cmp rcx, 7       
cmovge rcx, r9   //as Peter de Rivaz stated this will put 0 into rcx if it is greater or equal to seven
inc rcx 

and so

mov al, [rsp + rcx + 0Fh]

will effectively become [rsp + 1 + 0fh], [rsp + 2 + 0Fh], ..., [rsp + 7 + 0Fh]. Note that "PrimeSo" was stored at [rsp + 10h] meaning that [rsp + 1 + 0Fh] points to 'P'. In each iteration of the loop, al will become one of the characters in "PrimeSo" and it will cycle through them.

xor [rdx], al //This will do an xor operation on [rdx](begining of our message) and al wich is 'P' in the first loop.  
              //It will then store the result in it's place.  

inc rdx       //move to next character
dec r8        //decrease counter
jnz short loc_140001329 //and start the loop again  

With that being said lets look at the first few ones.

xor P, 07 == xor 50, 07 --> 57 = W  
xor r, 1D == xor 72, 1D --> 6F = o  
xor i, 1E == xor 69, 1E --> 77 = w  
xor m, 41 == xor 6D, 41 --> 2C = ,  

For those wondering here is the C++ code:

#include <fstream>

extern "C" void function(int length, char* message);

int main()
{
    char message[] = { 0x07, 0x1D, 0x1E, 0x41, 0x45, 0x2A, 0x00, 0x25, 0x52, 0x0D, 0x04, 0x01, 0x73, 0x06, 0x24, 0x53, 0x49, 0x39, 0x0D, 0x36, 0x4F, 0x35, 0x1F, 0x08, 0x04, 0x09, 0x73, 0x0E, 0x34, 0x16, 0x1B, 0x08, 0x16, 0x20, 0x4F, 0x39, 0x01, 0x49, 0x4A, 0x54, 0x3D, 0x1B, 0x35, 0x00, 0x07, 0x5C, 0x53, 0x0C, 0x08, 0x1E, 0x38, 0x11, 0x2A, 0x30, 0x13, 0x1F, 0x22, 0x1B, 0x04, 0x08, 0x16, 0x3C, 0x41, 0x33, 0x1D, 0x04, 0x4A, '\0'};
    function(sizeof(message) - 1, message);
    printf("Decoded Message is:\n%s\n", message);


    printf("\n");
    system("pause");
    return 0;
}

No I did not manually insert the data into message. Also note that I added a string terminator at the end and used sizeof(message) - 1 to avoid decoding the string terminator.
Here is the ASM code, this is simply a new file called assembly.asm and has this in it.

.code

function proc
    xor r9d, r9d
    mov dword ptr [rsp + 18h], 6D697250h 
    mov word ptr [rsp + 1Ch], 5365h 
    mov byte ptr [rsp + 1Eh], 6Fh 
    movsxd r8, ecx
    test ecx, ecx
    jle short locret_140001342 
    mov ecx, r9d

loc_140001329:
    cmp rcx, 7
    cmovge rcx, r9
    inc rcx 
    mov al, [rsp + rcx + 17h]
    xor [rdx], al
    inc rdx
    dec r8
    jnz short loc_140001329

locret_140001342:
    ret

function endp
end  

In visual studio, you can add a breakpoint in here and go to debug->windows->registers and debug->windows->memory-memory 1 to see the registers and the program's memory. Note that rcx will contain the count, and rdx will point to the beginning of the encoded message.

Thank you all for your help and suggestions, I couldn't of done it without you.

Upvotes: 1

Peter de Rivaz
Peter de Rivaz

Reputation: 33509

I think your understanding is largely correct, a few minor corrections:

Correction 1

test ecx, ecx                 //tests exc and sets the labels

This sets the flags (not the labels).

Correction 2

cmp rcx, 7                    //moves 7(decimal) into rcx

This compares rcx to the immediate value 7, and sets the flags accordingly. (i.e. after this instruction a conditional instruction such as gt will only execute if rcx was greater than 7.)

Correction 3

cmovge rcx, r9                //don't know

This conditionally (based on the flags you have just set) moves r9 into rcx. The condition is ge, so this instruction only executes if rcx was greater than or equal to 7. r9 contains 0, so the effect of this is to set rcx back to 0 when it reaches 7.

Parameters

You are not given information on the parameters to the function, but it seems safe to assume that rcx is the original length of the data to be decrypted, and rdx is a pointer to the data.

Upvotes: 1

Weather Vane
Weather Vane

Reputation: 34575

Here is my take on the code.

    ; rdx holds the message location
    ; ecx holds the message length

    xor r9d, r9d                ; r9d = 0
    mov [rsp+arg_10], 6D697250h ; fix up the key
    mov [rsp+arg_14], 5365h 
    mov [rsp+arg_16], 6Fh       ; which is "PrimeSo"
    movsxd r8, ecx              ; length counter
    test ecx, ecx               ; test the  message length
    jle short locret_140001342  ; skip if invalid length
    mov ecx, r9d                ; reset key index to 0
loc_140001329: 
    cmp rcx, 7                  ; check indexing of key
    cmovge rcx, r9              ; reset if o/range
    inc rcx                     ; obfusacte by incrementing first
    mov al, [rsp+rcx+arg_F]     ; ... and indexing wrong offset
    xor [rdx], al               ; encrypt the message byte
    inc rdx                     ; advance message pointer
    dec r8                      ; loop count
    jnz short loc_140001329     ; next message byte
locret_140001342: 
    retn

I decoded the message with a C program implementing the algorithm, but that would be too easy, so I won't post it.

Reverse engineering

The code does not contain enough information to solve it top-down, because some registers are used without being loaded, and labels are not defined. I solved it bottom-up, by identifying the instruction that does the encryption, and working out from there.

Although the stack labels are not defined, the nomenclature is enough of a clue to show that the parts of the key are actually consecutive, and the assumption of little-endian reveals the key. This is confirmed looking at the hex byte tabulation, which shows the three values being stored at offsets' lsb of 18, 1C and 1E

Upvotes: 2

jcomeau_ictx
jcomeau_ictx

Reputation: 38472

one thing I noticed is that the values being stored at those stack offsets are ASCII:

>>> '5072696d65536f'.decode('hex')
'PrimeSo'

as for entering the data, you could use xxd -r -p and read it from stdin in the program: xxd -r -p data.hex | ./myprog

those arg_14 etc. offsets have to be declared somewhere in the sources. but I would guess they're hex offsets 0xf, 0x10, 0x14, 0x16.

Upvotes: 1

Related Questions