Reputation: 989
Is there a way, within C code, to go from a textual representation of an ASM instruction (like cmpwi r3, 0x20
) to its binary representation (0x2c030020
)?
I am writing code that will be embedded into another application at runtime. That code is supposed to alter the behaviour / the code of the running program. That means, there is a bunch of code lines like this:
*((volatile int *)(0x80001234)) = 0x2c030020;
That code writes the ASM instruction cmpwi r3, 0x20
to 0x80001234, overwriting the current instruction at that address. Now, having the constant "0x2c030020" in my C code without knowing what that does is bad for maintaining the code. Thus, I'd usually add comments to code like the one above, stating the ASM instruction: // 2c 03 00 20 = cmpwi r3, 0x20
However, from time to time these get out of sync. I might do a quick change to the integer value and forgot to update the comment, or I might just make a typo in the comment, causing confusion.
Is there some way I could do something like this instead? (pseudo-code) *((volatile int *)(0x80001234)) = asm("cmpwi r3, 0x20");
which would then result in 0x2c030020 being written to 80001234? Or would I need a hacky solution with a custom preprocessor running over my C source files, replacing ASM instructions with their byte code?
I know there is the C syntax for inline assembler code using the asm()
function, but that would execute the given ASM instructions, not give me their binary representation.
Upvotes: 3
Views: 1871
Reputation: 365537
If you're building the code to run on PowerPC, another way to get those machine code bytes into your object file is with an asm
statement at global scope that assembles instructions into the .data
or .rodata
section.
asm(".section .rodata \n\t" // or .data if you want to modify it
".globl machine_code; \n\t"
"machine_code: \n\t"
"cmpwi 3,0x20 \n\t"
... );
extern uint32_t machine_code[]; // Declaration of the symbol that you define with asm
This is at global scope, and I think GCC will always change to the section it wants before emitting asm for anything (data or code), so you should be fine with .section
instead of .pushsection .rodata
first / .popsection
after like you'd need if you were emitting some static data from an asm statement inside a function.
The extern uint32_t machine_code[];
C declaration connects the C array name to the asm symbol name so you can just access the array to copy from it.
(AFAIK, PowerPC doesn't have an equivalent of ARM Thumb or RISC-V RV32c, so instruction words are always 32-bit. On RISCs with compressed instructions, you might declare it as an array of uint16_t
, or on x86 as an array of uint8_t
, and finding instruction boundaries would be a separate problem.)
If you want to be able to execute this machine code from here, put it in .text
, which is executable as well as readable. (And declare it as a function prototype instead of an array, or point a function pointer at the array.)
Nick's answer, using CPP constants for array initializers, has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates, if it wants. It also results in portable C that can compile for targets other than PowerPC.
Upvotes: 1
Reputation: 25409
This sounds like a mad thing to do, but I assume you have a good reason for it. Life's no fun without a little bit of madness.
One approach you could use is to use an assembler to during your build to generate compile-time constants.
The first step is to make a file that has every assembly instruction you will use, one per line.
For example:
cmpwi 3,0x20
addi 3,3,0
blr
Name that file input.def. Then, use this shell script:
#!/usr/bin/env bash
(cat << HEADER
.global main
.text
main:
HEADER
cat input.def) > asm.s
powerpc-linux-gnu-as asm.s -o asm.o
powerpc-linux-gnu-objdump -d asm.o | \
sed '1,/<main>/ d' | \
paste -d'\t' - input.def | \
awk -F'\t' '{
bytes=$2
asm=$4
disasm=$3
gsub(/ /, "", bytes);
gsub(/[, ]+/, "_", asm);
printf("#define ASM_%-20s 0x%s // disassembly: %s\n", asm, bytes, disasm)
}'
# Clean temporaries
rm asm.s asm.o
(I am using GNU assembler and objdump here. You might need to change this part if you don't use those tools. objdump is being used as a glorified hexdump utility here.)
This shell script:
This is a lot of work, but you can do all of it at compile time.
This produces a header file named asm.h:
#define ASM_cmpwi_3_0x20 0x2c030020 // disassembly: cmpwi r3,32
#define ASM_addi_3_3_0 0x38630000 // disassembly: addi r3,r3,0
#define ASM_blr 0x4e800020 // disassembly: blr
You use the asm.h file like this:
#include "asm.h"
*((volatile int *)(0x80001234)) = ASM_cmpwi_3_0x20;
If you need a new asm constant, edit input.def and re-run the shell script.
Upvotes: 1