Reputation: 21
I would like to know the steps for setting/clear EFLAGS.TF in user-mode on x86 IA32 Intel CPU
tried below for clearing the TF flag, But getting error ***** Unhandled interrupt vector *****
__asm__ volatile("pushl %eax\n\t"
"pushfl\n\t"
"popl %eax\n\t"
"xorl $0x0100, %eax\n\t"
"pushl %eax\n\t"
"popfl\n\t"
"popl %eax\n\t");
Upvotes: 1
Views: 391
Reputation: 365267
XOR flips a bit instead of always clearing it. AND is one option, BTR (bit-test-reset) is another. BTR with a memory destination is really slow with a register source, but it's not bad at all with an immediate (only 2 uops on Haswell, 3 on Skylake. Up to 4 on AMD, though, where it costs 2 uops even for btr $9, %eax
.)
popf
is quite slow (9 uops, 1 per 20 cycles on Skylake). Or on Ryzen, 35 uops and one per 13 cycles. (http://agner.org/optimize). So optimizing the surrounding code won't make a big difference, but it's fun to find a way to keep the code-size compact.
You don't need to save/restore EAX yourself, just tell the compiler you want to clobber it with : "eax"
as the clobber list, or use a dummy output operand (Note that I'm using GNU C extended asm, not basic).
static inline
void clear_tf(void) {
long dummy; // there's no type that's always 32-bit on 32-bit, and always 64 on 64-bit. x32 uses 32-bit pointers in long mode so uintptr_t or size_t doesn't work.
// if porting to x86-64 System V user-space: beware that push clobbers the red-zone
__asm__ volatile("pushf \n\t"
"pop %[tmp] \n\t"
"btr $9, %[tmp]\n\t" // reset bit 9
"push %[tmp] \n\t"
"popf"
: [tmp] "=r"(dummy)
: // no inputs
: // no clobbers. // "memory" // would block reordering with loads/stores.
);
}
Or simply don't touch it any registers: this is very efficient, too, especially on AMD Ryzen where there are no stack-sync uops and memory-destination AND is a single-uop.
static inline
void clear_tf(void) {
// if porting to x86-64 System V user-space: beware that push clobbers the red-zone
__asm__ volatile("pushf \n\t"
"andl $0xFFFFFEFF, (%esp) \n\t" // 1 byte larger than the pop/btr/push version
"popf"
);
// Basic asm syntax: no clobbers.
}
For smaller code size, btrl $9, (%esp)
is probably good. Still only 2 uops on Haswell (3 on Skylake), but 2 bytes smaller than andl
. andb $0xfe, 1(%esp)
is also the same size, but causes a store-forwarding stall and is 2 uops + a stack-sync uop on Intel when used after push
. pop %%eax; and $0xfe, %ah; push %eax
is also the same size, and also 3 uops (plus a partial-register merging uop which issues in a cycle by itself on Haswell / SKL). But it's nice on AMD.
Portability
BTW, in x86-64 System V user-space code you can't safely push/pop without clobbering the compiler's red zone, so you'd probably want to add $-128, %rsp
before push
, and restore it after.
In kernel code there's no red-zone, so push/pop inside inline asm is fine.
Windows uses a different ABI with no red-zone.
Upvotes: 3
Reputation: 21
with below code it worked fine. Thank you
__asm__ volatile("pushl %eax;\
pushfl;\
popl %eax;\
andl $0xFFFFFEFF, %eax;\
pushl %eax;\
popfl;\
popl %eax;"
);
Upvotes: 1