Reputation: 1

Disassembler logic for custom opcodes in C

So I'm building a disassembler that will convert a file containing hexadecimal data into assembly language.

So from this format I could convert the hexadecimal data in the file into decimal using uint8_t and store them in an array. Then I decided to bit shift the last number in the array to get number of instructions of the last function; essentially I'm parsing backwards since I don't know how much padding there are at the beginning and the number of ops in a function is given at the end of the function. But then I realised that the operations varies in bit size and aren't in perfect 8 or 16 bit bounds. So then I was stuck since my array, using the example at the top, was essentially this:

uint8_t hex[] = {0x00, 0x03, 0x02, 0x01, 0x42, 0x82, 0x86, 0x04, 0x10, 0x45};

So can anyone help me with the logic in parsing? This is my first time posting so I'm sorry if I'm missing anything and will provide more information or delete if needed

Upvotes: 0

Answers (1)

Jerry Jeremiah

Reputation: 9618

Instead of shifting and masking (which I think would be really complicated) what if you convert the uint8_t array into an array of bits - it uses a lot more memory but you can access individual bits much easier.

Here is a sample program that does this:

#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>

uint8_t getBits(uint8_t *bits, uint8_t size, uint32_t *index)
{
    uint8_t value = 0;
    *index -= size; // decrement index to the starting point
    for(uint32_t i=0; i<size; i++)
        value = (value<<1) | bits[*index+i];
    return value;
}

int main()
{
    // sample program
    uint8_t array[] = {0x00,0x03,0x02,0x01,0x42,0x82,0x86,0x04,0x10,0x45};
    
    // program with zero padding
    // uint8_t array[] = {0xE8,0x39,0x06,0xA0,0xC4,0x16,0x82,0x90,0x4A,0x08,0x41};
    
    uint32_t array_size = sizeof(array)/sizeof(*array); // 10 bytes
    uint32_t bits_size = 8*array_size; // 80 bytes
    uint8_t* bits = malloc(bits_size);
    
    for(uint32_t a=0;a<array_size;a++)
        for(uint32_t b=0;b<8;b++)
            bits[a*8+b] = (array[a] >> (7-b)) & 1;
    
    puts("Binary program file:");
    for(uint32_t i=0;i<bits_size;i++)
        printf("%s%d",(i%8?"":" "),bits[i]);
    puts("");
    
    enum                    {  MOV,  CAL,  RET,  REF,  ADD,  PRINT,  NOT,  EQU};
    uint8_t params[]      = {    2,    1,    0,    2,    2,      1,    1,    1};
    const char *opcodes[] = {"MOV","CAL","RET","REF","ADD","PRINT","NOT","EQU"};

    enum                    {  VAL,  REG,  STK,  PTR};
    uint8_t value_size[]  = {    8,    3,    5,    5};
    const char *types[]   = {"VAL","REG","STK","PTR"};

    uint32_t index = bits_size; // start at end
    
    // minimum program size is function(3) + opcode(3) + size(5)
    // if there are less than that number of bits then it must be padding
    while(index>10)
    {
        uint8_t size = getBits(bits,5,&index);
        printf("\nsize=%d\n",size);
        if (size > 0)
        {
            for(int o=0; o<size; o++)
            {
                uint8_t opcode = getBits(bits,3,&index);
                printf("opcode=%s",opcodes[opcode]);
                
                for(int p=0; p<params[opcode]; p++)
                {
                    printf("%c ",p?',':':');
                    
                    uint8_t type = getBits(bits,2,&index);
                    printf("type=%s ",types[type]);
                    
                    uint8_t value = getBits(bits,value_size[type],&index);
                    printf("value=%d",value);
                }
                
                puts("");
            }
        
            uint8_t function = getBits(bits,3,&index);
            printf("function=%d\n",function);
        }
    }
    return 0;
}

Try it at https://onlinegdb.com/S1qVStz8d

How it getBits() works:

You make an array of individual digits from the original value, and then you take bits from it one at a time to make a new value - getBits() is the function I have written for that.

To understand how it works imagine how it works in base 10: 321 is put into the array {3,2,1} and you could turn it back into a value with:

value = 0;
value = value*10 + digits[0];
value = value*10 + digits[1];
value = value*10 + digits[2];

Which gives (((0)*10+3)*10+2)*10+1 which is 321

If 5 (binary 101) is put into the array {1,0,1}, you could turn it back into a value with:

value = 0;
value = value*2 + bits[0];
value = value*2 + bits[1];
value = value*2 + bits[2];

Which gives (((0)*2+1)*2+0)*2+1 which is 5 (binary 101)

And that does work. And a decent compiler would optimize the *2 into <<1 and the + into |, but you could do it yourself (which is what I did):

value = 0;
value = (value<<1) | bits[0];
value = (value<<1) | bits[1];
value = (value<<1) | bits[2];

Which produces that same binary 00000101

It's just a readability thing - with decimal you expect to see value*10+x but with binary you expect to see bit operations like shift/or instead of math operations like multiply/add.

Then, if you use a loop with a size and an index that points to the end of the array, you get:

uint8_t value = 0;
index -= size; // decrement index to the starting point
for(uint32_t i=0; i<size; i++)
    value = (value<<1) | bits[index+i];

But, of course, if it is a function then index needs to be a pointer and you need to dereference it everywhere:

uint8_t getBits(uint8_t *bits, uint8_t size, uint32_t *index)
{
    uint8_t value = 0;
    *index -= size; // decrement index to the starting point
    for(uint32_t i=0; i<size; i++)
        value = (value<<1) | bits[*index+i];
    return value;
}

Upvotes: 2

Disassembler logic for custom opcodes in C

Answers (1)

Related Questions