Reputation: 64905
Does nasm have any built-in way to emit long-nop (aka multi-byte nops) instructions of a given length?
Upvotes: 6
Views: 2991
Reputation: 11219
Just quoting https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf page 124 (3-28) from December, 2017 :
3.5.1.10 Using NOPs
Code generators generate a no-operation (NOP) to align instructions. Examples of NOPs of different lengths in 32-bit mode are shown below:
1-byte: XCHG EAX, EAX
2-byte: 66 NOP
3-byte: LEA REG, 0 (REG) (8-bit displacement)
4-byte: NOP DWORD PTR [EAX + 0] (8-bit displacement)
5-byte: NOP DWORD PTR [EAX + EAX*1 + 0] (8-bit displacement)
6-byte: LEA REG, 0 (REG) (32-bit displacement)
7-byte: NOP DWORD PTR [EAX + 0] (32-bit displacement)
8-byte: NOP DWORD PTR [EAX + EAX*1 + 0] (32-bit displacement)
9-byte: NOP WORD PTR [EAX + EAX*1 + 0] (32-bit displacement)
These are all true NOPs, having no effect on the state of the machine except to advance the EIP.
Because NOPs require hardware resources to decode and execute, use the fewest number to achieve the desired padding.
The one byte NOP:[XCHG EAX,EAX] has special hardware support. Although it still consumes a µop and its accompanying resources, the dependence upon the old value of EAX is removed.
This µop can be executed at the earliest possible opportunity, reducing the number of outstanding instructions and is the lowest cost NOP.
The other NOPs have no special hardware support. Their input and output registers are interpreted by the hardware. Therefore, a code generator should arrange to use the register containing the oldest value as input, so that the NOP will dispatch and release RS resources at the earliest possible opportunity.
Try to observe the following NOP generation priority:
• Select the smallest number of NOPs and pseudo-NOPs to provide the desired padding.
• Select NOPs that are least likely to execute on slower execution unit clusters.
• Select the register arguments of NOPs to reduce dependencies.
Upvotes: 2
Reputation: 64905
The answer seems to be that no, out of the box, there is no official way to emit these long-nops in nasm1 out of the box.
So I just wrote my own macros for 1 to 9 bytes based on the recommended sequences from the Intel manuals2:
;; long-nop instructions: nopX inserts a nop of X bytes
;; see "Table 4-12. Recommended Multi-Byte Sequence of NOP Instruction" in
;; "Intel® 64 and IA-32 Architectures Software Developer’s Manual" (325383-061US)
%define nop1 nop ; just a nop, included for completeness
%define nop2 db 0x66, 0x90 ; 66 NOP
%define nop3 db 0x0F, 0x1F, 0x00 ; NOP DWORD ptr [EAX]
%define nop4 db 0x0F, 0x1F, 0x40, 0x00 ; NOP DWORD ptr [EAX + 00H]
%define nop5 db 0x0F, 0x1F, 0x44, 0x00, 0x00 ; NOP DWORD ptr [EAX + EAX*1 + 00H]
%define nop6 db 0x66, 0x0F, 0x1F, 0x44, 0x00, 0x00 ; 66 NOP DWORD ptr [EAX + EAX*1 + 00H]
%define nop7 db 0x0F, 0x1F, 0x80, 0x00, 0x00, 0x00, 0x00 ; NOP DWORD ptr [EAX + 00000000H]
%define nop8 db 0x0F, 0x1F, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 ; NOP DWORD ptr [EAX + EAX*1 + 00000000H]
%define nop9 db 0x66, 0x0F, 0x1F, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 ; 66 NOP DWORD ptr [EAX + EAX*1 + 00000000H]
I've also added these to the nasm-utils project, so that's one way to get them if you have the same need.
1Although as Jester points out, you can dig into the internals to find some macros used to implement the "smart align" feature.
2For the record, I believe these first appeared in the AMD manuals and that eventually Intel adopted the same recommended sequences.
Upvotes: 4
Reputation: 20720
Note that code wise there is only one NOP
instruction in the Intel processors. This has code 0x90 and it's just one byte.
The longer "nop"'s are instructions that do nothing such as XCHG
of a register with itself. For example, for a "2 bytes NOP
", you write:
XCHG AL, AL
Which is encoded as:
86 C0
So you could write macros to get any size you'd like. It's a bit of work to find all of those "do nothing" instructions. Plus, at times (most often) the compiler tries to optimize expressions on you. That's where entering the codes may be a requirement.
The longest encoding that I knew about would use the LEA
instruction. This is where the size of the address offsets could be optimized out, since they're going to be zeroes, many zeroes, and they should be optimized.
And as Jester mentioned, you could use the existing macros. There is a copy of the file on the Internet.
https://github.com/letolabs/nasm/blob/master/macros/smartalign.mac
It can be fun to decode all of those instructions and see what they are.
For example, they use a MOV %si, %si
to create a 2 bytes NOP
.
Upvotes: -2