richejose
richejose

Reputation: 21

How to set/clear TF flag on x86 IA32 Intel CPU in user-mode

I would like to know the steps for setting/clear EFLAGS.TF in user-mode on x86 IA32 Intel CPU

tried below for clearing the TF flag, But getting error ***** Unhandled interrupt vector *****

__asm__ volatile("pushl %eax\n\t"
                        "pushfl\n\t"
                        "popl %eax\n\t"
                        "xorl $0x0100, %eax\n\t"
                        "pushl %eax\n\t"
                        "popfl\n\t"
                        "popl %eax\n\t");

Upvotes: 1

Views: 391

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 365267

XOR flips a bit instead of always clearing it. AND is one option, BTR (bit-test-reset) is another. BTR with a memory destination is really slow with a register source, but it's not bad at all with an immediate (only 2 uops on Haswell, 3 on Skylake. Up to 4 on AMD, though, where it costs 2 uops even for btr $9, %eax.)

popf is quite slow (9 uops, 1 per 20 cycles on Skylake). Or on Ryzen, 35 uops and one per 13 cycles. (http://agner.org/optimize). So optimizing the surrounding code won't make a big difference, but it's fun to find a way to keep the code-size compact.

You don't need to save/restore EAX yourself, just tell the compiler you want to clobber it with : "eax" as the clobber list, or use a dummy output operand (Note that I'm using GNU C extended asm, not basic).

static inline
void clear_tf(void) {
    long dummy;       // there's no type that's always 32-bit on 32-bit, and always 64 on 64-bit.  x32 uses 32-bit pointers in long mode so uintptr_t or size_t doesn't work.
   // if porting to x86-64 System V user-space: beware that push clobbers the red-zone
    __asm__ volatile("pushf \n\t"
                     "pop   %[tmp] \n\t"
                     "btr   $9, %[tmp]\n\t"   // reset bit 9
                     "push  %[tmp] \n\t"
                     "popf"
                    : [tmp] "=r"(dummy)
                    : // no inputs
                    : // no clobbers.  // "memory" // would block reordering with loads/stores.
                );
}

Or simply don't touch it any registers: this is very efficient, too, especially on AMD Ryzen where there are no stack-sync uops and memory-destination AND is a single-uop.

static inline
void clear_tf(void) {
   // if porting to x86-64 System V user-space: beware that push clobbers the red-zone
    __asm__ volatile("pushf \n\t"
                     "andl $0xFFFFFEFF, (%esp) \n\t"  // 1 byte larger than the pop/btr/push version
                     "popf"
                );
    // Basic asm syntax: no clobbers.
}

For smaller code size, btrl $9, (%esp) is probably good. Still only 2 uops on Haswell (3 on Skylake), but 2 bytes smaller than andl. andb $0xfe, 1(%esp) is also the same size, but causes a store-forwarding stall and is 2 uops + a stack-sync uop on Intel when used after push. pop %%eax; and $0xfe, %ah; push %eax is also the same size, and also 3 uops (plus a partial-register merging uop which issues in a cycle by itself on Haswell / SKL). But it's nice on AMD.


Portability

BTW, in x86-64 System V user-space code you can't safely push/pop without clobbering the compiler's red zone, so you'd probably want to add $-128, %rsp before push, and restore it after.

In kernel code there's no red-zone, so push/pop inside inline asm is fine.

Windows uses a different ABI with no red-zone.

Upvotes: 3

richejose
richejose

Reputation: 21

with below code it worked fine. Thank you

  __asm__ volatile("pushl %eax;\
                    pushfl;\
                    popl %eax;\
                    andl $0xFFFFFEFF, %eax;\
                    pushl %eax;\
                    popfl;\
                    popl %eax;"
                    );

Upvotes: 1

Related Questions