werico4026
werico4026

Reputation: 73

creating shellcode problems with mov reg to reg

Ok so im trying to creat a function that creates shellcode.

Im having alot of problems working out the rex / mod stuff.

My current code kind of works.

So far if the regs are smaller then R8 it works fine.

If i use one reg that is smaller then R8 its fine.

Problem is once i have to regs smaller then r8 and are the same or if the src is smaller i get problems

enum Reg64 : uint8_t {
    RAX = 0, RCX = 1, RDX = 2, RBX = 3,
    RSP = 4, RBP = 5, RSI = 6, RDI = 7,
    R8 = 8, R9 = 9, R10 = 10, R11 = 11,
    R12 = 12, R13 = 13, R14 = 14, R15 = 15
};

inline uint8_t encode_rex(uint8_t is_64_bit, uint8_t extend_sib_index, uint8_t extend_modrm_reg, uint8_t extend_modrm_rm) {
    struct Result {
        uint8_t b : 1;
        uint8_t x : 1;
        uint8_t r : 1;
        uint8_t w : 1;
        uint8_t fixed : 4;
    } result{ extend_modrm_rm, extend_modrm_reg, extend_sib_index, is_64_bit, 0b100 };
    return *(uint8_t*)&result;
}
inline uint8_t encode_modrm(uint8_t mod, uint8_t rm, uint8_t reg) {
    struct Result {
        uint8_t rm : 3;
        uint8_t reg : 3;
        uint8_t mod : 2;
    } result{ rm, reg, mod };
    return *(uint8_t*)&result;
}

    inline void mov(Reg64 dest, Reg64 src) {
        if (dest >= 8)
            put<uint8_t>(encode_rex(1, 2, 0, 1));
        else if (src >= 8)
            put<uint8_t>(encode_rex(1, 1, 0, 2));
        else
            put<uint8_t>(encode_rex(1, 0, 0, 0));

        put<uint8_t>(0x89);

        put<uint8_t>(encode_modrm(3, dest, src));
    }

    //c.mov(Reg64::RAX, Reg64::RAX); // works
    //c.mov(Reg64::RAX, Reg64::R9); // works
    //c.mov(Reg64::R9, Reg64::RAX); // works
    //c.mov(Reg64::R9, Reg64::R9); // Does not work returns (mov r9,rcx)

Also if there is a shorter way to do this without all the if's that would be great.

Upvotes: 2

Views: 224

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 365352

FYI, most people create shellcode by assembling with a normal assembler like NASM, then hexdumping that binary into a C string. Writing your own assembler can be a fun project but is basically a separate project.


Your encode_rex looks somewhat sensible, taking four args for the four bits. But the code in mov that calls it passes a 2 sometimes, which will truncate to 0!

Also, there are 4 possibilities for the 2 relevant extension bits (b and x) you're using for reg-reg moves. But your if/else if/else chain only covers 3 of them, ignoring the possibility of dest>=8 && src >= 8 => x:b = 3

Since those two bits are orthogonal, you should just calculate them separately like this:

put<uint8_t>(encode_rex(1, 0, dest>=8, src>=8));

The SIB-index x field should always be 0 because you don't have a SIB byte, just ModRM for a reg-reg mov.

You have your struct initializer in encode_rex mixed up, with extend_modrm_reg being 2nd where it will initialize the x field instead of r. Your bitfield names match https://wiki.osdev.org/X86-64_Instruction_Encoding#Encoding, but you have the wrong C++ variables initializing them. See that link for descriptions.


Possibly I have the dest, src order backwards, depending on whether you're using the mov r/m, r or the mov r, r/m opcode. I didn't double-check which is which.

Sanity check from NASM: I assembled with nasm -felf64 -l/dev/stdout to get a listing:

     1 00000000 4889C8                  mov rax, rcx
     2 00000003 4889C0                  mov rax, rax
     3 00000006 4D89C0                  mov r8, r8
     4 00000009 4989C0                  mov r8, rax
     5 0000000C 4C89C0                  mov rax, r8

You're using the same 0x89 opcode that NASM uses, so your REX prefixes should match.


return *(uint8_t*)&result; is strict-aliasing UB and not safe outside of MSVC.

Use memcpy to safely type-pun. (Or a union; most real-world C++ compilers including gcc/clang/MSVC do define the behaviour of union type-punning as in C99, unlike ISO C++).

Upvotes: 1

Related Questions