Reputation: 3870
I have this below program
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
int x = 1;
void ouch(int sig) {
printf("OUCH! dividing by zero!\n");
x = 0;
}
void fpe(int sig) {
printf("FPE! I got a signal: %d\n",sig);
psignal(sig, "psignal");
x = 1;
}
int main(void) {
(void) signal(SIGINT, ouch);
(void) signal(SIGFPE, fpe);
while(1)
{
printf("Hello World: %d\n",1/x);
sleep(1);
}
}
Problem: While executing this program - when I give a SIGINT from the terminal to the program - the ""OUCH! dividing by zero! " is output - as Expected. the next message is the "FPE! I got a signal: 8 psignal: Floating point exception " . and this message goes on and on - doesn't stop. My doubt is after calling the fpe signal handler , I set x to be 1 . I hence expect Hello World should be displayed in the output.
Below is a transcript of the output I am getting :
Hello World: 1
Hello World: 1
^COUCH! dividing by zero!
FPE! I got a signal: 8
psignal: Floating point exception
FPE! I got a signal: 8
psignal: Floating point exception
FPE! I got a signal: 8
psignal: Floating point exception
^COUCH! dividing by zero!
.
.
.
.
Upvotes: 1
Views: 1302
Reputation: 43698
After handling a signal raised while executing an instruction, the PC may return to either that instruction or to the following one. Which one it does is very CPU + OS specific. In addition, whether integer division by zero raises SIGFPE is also CPU + OS dependant.
At the CPU level, after taking an exception, it makes most sense to return to the offending instruction, after the OS has had the chance to do whatever it needs to (think of page faults/TLB misses), and run that instruction again. (The OS may have had to do some address correction, for instance, ARM CPUs point two instructions after the offending instruction as a testament to their original 3-stage pipeline, while MIPS CPU's point to the jump when taking an exception from an instruction on a jump delay slot).
At the OS level, there are several ways to handle exceptions:
A non-portable method to deal with SIGFPE is calling longjmp() from the signal handler, as in my answer to a similar question on SIGSEGV.
n1318 has more details on the longjmp() from signal handler that you ever wanted to know. Also note that POSIX specifies that longjmp() should work from non-nested signal handlers.
Upvotes: 1
Reputation: 21306
When the signal handler is entered, the program counter (CPU register pointing at the currently executing instruction) is saved where the divide-by-zero occurred. Ignoring the signal restores the PC to exactly the same place, upon which the signal is triggered again (and again, and again).
The value or volatility of 'x' is irrelevant by this point - the zero has been transferred into a CPU register in readiness to perform the divide.
man 2 signal notes that:
According to POSIX, the behaviour of a process is undefined after it ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by the kill(2) or the raise(3) functions. Integer division by zero has undefined result. On some architectures it will generate a SIGFPE signal. (Also dividing the most negative integer by -1 may generate SIGFPE.) Ignoring this signal might lead to an endless loop.
We can see this in gdb if you compile with the debug flag:
simon@diablo:~$ gcc -g -o sigtest sigtest.c simon@diablo:~$ gdb sigtest GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu"...
By default gdb won't pass SIGINT to the process - change this so it sees the first signal:
(gdb) handle SIGINT pass SIGINT is used by the debugger. Are you sure you want to change it? (y or n) y Signal Stop Print Pass to program Description SIGINT Yes Yes Yes Interrupt
Off we go:
(gdb) run Starting program: /home/simon/sigtest x = 1 Hello World: 1
Now let's interrupt it:
^C Program received signal SIGINT, Interrupt. 0xb767e17b in nanosleep () from /lib/libc.so.6
and onwards to the divide:
(gdb) cont Continuing. OUCH! dividing by zero! x = 0 Program received signal SIGFPE, Arithmetic exception. 0x0804853a in main () at sigtest.c:30 30 printf("Hello World: %d\n",1/x);
Check the value of 'x', and continue:
(gdb) print x $1 = 0 (gdb) cont Continuing. FPE! I got a signal: 8 psignal: Floating point exception Program received signal SIGFPE, Arithmetic exception. 0x0804853a in main () at sigtest.c:30 30 printf("Hello World: %d\n",1/x); (gdb) print x $2 = 1
x is clearly now 1 and we still got a divide-by-zero - what's going on? Let's inspect the underlying assembler:
(gdb) disassemble Dump of assembler code for function main: 0x080484ca : lea 0x4(%esp),%ecx 0x080484ce : and $0xfffffff0,%esp ... 0x08048533 : mov %eax,%ecx 0x08048535 : mov %edx,%eax 0x08048537 : sar $0x1f,%edx 0x0804853a : idiv %ecx <<-- address FPE occurred at 0x0804853c : mov %eax,0x4(%esp) 0x08048540 : movl $0x8048653,(%esp) 0x08048547 : call 0x8048384 0x0804854c : jmp 0x8048503 End of assembler dump.
One Google search later tells us that IDIV divides the value in the EAX register by the source operand (ECX). You can probably guess the register contents:
(gdb) info registers eax 0x1 1 ecx 0x0 0 ...
Upvotes: 10
Reputation: 93770
You should use volatile int x
to ensure that the compiler reloads x from memory each time through the loop. Given that your SIGINT handler works, this probably does not explain your specific problem, but if you try more complicated examples (or crank up the optimization) it will eventually bite you.
Upvotes: 1