Reputation: 181
I think my real problem is I don't completely understand the stack frame mechanism so I am looking to understand why the following code causes the program execution to resume at the end of the application.
This code is called from a C function which is several call levels deep and the pushf causes program execution to revert back several levels through the stack and completely exit the program.
Since my work around works as expected I would like to know why using the pushf instruction appears to be (I assume) corrupting the stack.
In the routines I usually setup and clean up the stack with : sub rsp, 28h ... add rsp, 28h
However I noticed that this is only necessary when the assembly code calls a C function.
So I tried removing this from both routines but it made no difference. SaveFlagsCmb is an assembly function but could easily be a macro.
The code represents an emulated 6809 CPU Rora (Rotate Right Register A).
PUBLIC Rora_I_A ; Op 46 - Rotate Right through Carry A reg
Rora_I_A PROC
sub rsp, 28h
; Restore Flags
mov cx, word ptr [x86flags]
push cx
popf
; Rotate Right the byte and save the FLAGS
rcr byte ptr [q_s+AREG], 1
; rcr only affects Carry. Save the Carry first in dx then
; add 0 to result to trigger Zero and Sign/Neg flags
pushf ; this causes jump to end of program ????
pop dx ; this line never reached
and dx, CF ; Save only Carry Flag
add [q_s+AREG], 0 ; trigger NZ flags
mov rcx, NF+ZF+CF ; Flag Mask NZ
Call SaveFlagsCmb ; NZ from the add and CF saved in dx
add rsp, 28h
ret
Rora_I_A ENDP
However if I use this code it works as expected:
PUBLIC Rora_I_A ; Op 46 - Rotate Right through Carry A reg
Rora_I_A PROC
; sub rsp, 28h ; works with or without this!!!
; Restore Flags
mov ah, byte ptr [x86flags+LSB]
sahf
; Rotate Right the byte and save the FLAGS
rcr byte ptr [q_s+AREG], 1
; rcr only affects Carry. Save the Carry first in dx then
; add 0 to result to trigger Zero and Sign/Neg flags
lahf
mov dl, ah
and dx, CF ; Save only Carry Flag
add [q_s+AREG], 0 ; trigger NZ flags
mov rcx, NF+ZF+CF ; Flag Mask NZ
Call SaveFlagsCmb ; NZ from the add and CF saved in dx
; add rsp, 28h ; works with or without this!!!
ret
Rora_I_A ENDP
Upvotes: 2
Views: 982
Reputation: 364049
Your reported behaviour doesn't really make sense. Mostly this answer is just providing some background not a real answer, and a suggestion not to use pushf
/popf
in the first place for performance reasons.
Make sure your debugging tools work properly and aren't being fooled by something into falsely showing a "jump" to somewhere. (And jump where exactly?)
There's little reason to mess around with 16-bit operand size, but that's probably not your problem.
In Visual Studio / MASM, apparently (according to OP's comment) pushf
assembles as pushfw
, 66 9C
which pushes 2 bytes. Presumably popf
also assembles as popfw
, only popping 2 bytes into FLAGS instead of the normal 8 bytes into RFLAGS. Other assemblers are different.1
So your code should work. Unless you're accidentally setting some other bit in FLAGS that breaks execution? There are bits in EFLAGS/RFLAGS other than condition codes, including the single-step TF Trap Flag: debug exception after every instruction.
We know you're in 64-bit mode, not 32-bit compat mode, otherwise rsp
wouldn't be a valid register. And running 64-bit machine code in 32-bit mode wouldn't explain your observations either.
I'm not sure how that would explain pushf
being a jump to anywhere. pushf
itself can't fault or jump, and if popf
set TF then the instruction after popf
would have caused a debug exception.
Are you sure you're assembling 64-bit machine code and running it in 64-bit mode? The only thing that would be different if a CPU decoded your code in 32-bit mode should be the REX prefix on sub rsp, 28h
, and the RIP-relative addressing mode on [x86flags]
decoding as absolute (which would presumably fault). So I don't think that could explain what you're seeing.
Are you sure you're single-stepping by instructions (not source lines or C statements) with a debugger to test this?
Use a debugger to look at the machine code as you single-step. This seem really weird.
Anyway, it seems like a very low-performance idea to use pushf
/ popf
at all, and also to be using 16-bit operand-size creating false dependencies.
e.g. you can set x86 CF with movzx ecx, word ptr [x86flags]
/ bt ecx, CF
.
You can capture the output CF with setc cl
Also, if you're going to do multiple things to the byte from the guest memory, load it into an x86 register. A memory-destination RCR and a memory-destination ADD are unnecessarily slow vs. load / rcr / ... / test reg,reg
/ store.
LAHF/SAHF may be useful, but you can also do without them too for many cases. popf
is quite slow (https://agner.org/optimize/) and it forces a round trip through memory. However, there is one condition-code outside the low 8 in x86 FLAGS: OF
(signed overflow). asm-source compatibility with 8080 is still hurting x86 in 2019 :(
You can restore OF from a 0/1 integer with add al, 127
: if AL was originally 1
, it will overflow to 0x80
, otherwise it won't. You can then restore the rest of the condition codes with SAHF. You can extract OF with seto al
. Or you can just use pushf/popf.
; sub rsp, 28h ; works with or without this!!!
Yes of course. You have a leaf function that doesn't use any stack space.
You only need to reserve another 40 bytes (align the stack + 32 bytes of shadow space) if you were going to make another function call from this function.
In NASM, pushf/popf
default to the same width as other push/pop instructions: 8 bytes in 64-bit mode. You get the normal encoding without an operand-size prefix. (https://www.felixcloutier.com/x86/pushf:pushfd:pushfq)
Like for integer registers, both 16 and 64-bit operand-size for pushf/popf are encodeable in 64-bit mode, but 32-bit operand size isn't.
In NASM, your code would be broken because push cx
/ popf
would push 2 bytes and pop 8, popping 6 bytes of your return address into RFLAGS.
But apparently MASM isn't like that. Probably a good idea to use explicit operand-size specifiers anyway, like pushfw
and popfw
if you use it at all, to make sure you get the 66 9C
encoding, not just 9C pushfq
.
Or better, use pushfq
and pop rcx
like a normal person: only write to 8 or 16-bit partial registers when you need to, and keep the stack qword-aligned. (16-byte alignment before call, 8-byte alignment always.)
Upvotes: 3
Reputation: 2604
I believe this is a bug in Visual Studio. I'm using 2022, so it's an issue that's been around for a while.
I don't know exactly what is triggering it, however stepping over one specific pushf
in my code had the same symptoms, albeit with the code actually working.
Putting a breakpoint on the line after the pushf
did break, and allowed further debugging of my app. Adding a push ax
, pop ax
before the pushf
also seemed to fix the issue. So it must be a Visual Studio issue.
At this point I think MASM and debugging in Visual Studio has pretty much been abandoned. Any suggestions for alternatives for developing dlls on Windows would be appreciated!
Upvotes: 0