Damian Pereira
Damian Pereira

Reputation: 513

How do I clean up badly-formatted x86_64 assembly?

I have some assembly I want to clean up. It has all caps, inconsistent spacing and lots of unneeded newlines.

How do I beautify this x86_64 assembly code?

Upvotes: 3

Views: 826

Answers (1)

icecreamsword
icecreamsword

Reputation: 947

I don't know of anything specific to assembly, but the things you've mentioned can be accomplished with sed.

A couple of things to note:

  • Mnemonics are matched by the regex [A-Za-z0-9]+. Off the top of my head I can't think of any mnemonics that include other characters.
  • The upper half of the GPRs are matched by r(8|9|1[0-5])(b|w|d)?
  • The byte GPRs (excluding r8b-r15b) are matched by [abcd](l|h)|(sp|bp|si|di)l
  • The 16-, 32-, and 64-bit lower 8 GPRs are matched by [er]?([abcd]x|sp|bp|si|di)
  • SSE registers can be matched with the regex xmm(1[0-5]?|[0,2-9])

For example:

# Replace tabs with spaces, then clean up lines of the form "op reg/imm, ..."
# N.B. without the /I option the match will be case-sensitive
sed 's/\t/ /g' <test.s | sed 's/^\s*\([a-z0-9][a-z0-9]*\)\s\s*\([a-z0-9][a-z0-9]*\)\s*,\s*/\t\1\t\2, /I'

# Lowercase all GPRs and SSE vector registers"
# I have chosen not to use the more compact patterns above in the interest of readability.
... | sed '/\([^a-z]\)\(AL|AH|AX|EAX|RAX|...XMM0|XMM1|...|XMM15\)/\1\L\2/gI'

# Lowercase all instruction mnemonics. More specifically, matches the first thing on every line except when it is followed by a colon.
... | sed '/^\s*\([a-z0-9][a-z0-9]*\)\([^:]\)/\L\1\2/I

Upvotes: 5

Related Questions