FedKad
FedKad

Reputation: 605

GDB resuming execution after the exception cause is fixed

Assume I have a program with a statement at line 5 which creates an arithmetic exception: division by zero.

When I run the program in GDB I get something like this:

Program received signal SIGFPE, Arithmetic exception.
0x... in ... at prog.c:5

At that point, I change the divisor variable to a nonzero value using set var b=1 and then try to "resume" from the same statement that got the exception.

Using the command continue gives the following:

Program terminated with signal SIGFPE, Arithmetic exception.
The program no longer exists.

So the correct way is to use (instead of the continue command) a command like jump 5 which works as expected.

Is there any other easier GDB command to do this with no need to specify the line number where the program got the exception? There may be more than one statement in the line, so addressing with line number cannot be feasible all the times.

I couldn't find anything related to this in GDB Documentation.

Upvotes: 1

Views: 352

Answers (3)

FedKad
FedKad

Reputation: 605

As @MarkPlotnick commented in my question, signal 0 is the answer I was looking for (subject to all caveats mentioned in @CraigEstey's answer):

$ gdb -silent ./a.out  ## Program compiled with disabled optimizations!
Reading symbols from ./a.out...
(gdb) l 1,20
1   #include <stdio.h>
2   
3   void bol (int a, int b)
4   {
5     printf("%d / %d = %d\n", a, b, a/b);
6   } /* bol */
7   
8   int main (void)
9   {
10    int x, y;
11    for (x=170, y=6; x>-170; x-=30, y-=2)
12      bol(x, y);
13  } /* main */
(gdb) r
Starting program: [...]/a.out 
170 / 6 = 28
140 / 4 = 35
110 / 2 = 55

Program received signal SIGFPE, Arithmetic exception.
0x000055555555515f in bol (a=80, b=0) at prog.c:5
5     printf("%d / %d = %d\n", a, b, a/b);
(gdb) set var b=1
(gdb) sig 0
Continuing with no signal.
80 / 1 = 80
50 / -2 = -25
20 / -4 = -5
-10 / -6 = 1
-40 / -8 = 5
-70 / -10 = 7
-100 / -12 = 8
-130 / -14 = 9
-160 / -16 = 10
[Inferior 1 (process 7088) exited normally]
(gdb) q

Upvotes: 1

Luis Colorado
Luis Colorado

Reputation: 12668

Program received signal SIGFPE, Arithmetic exception. 0x... in ... at prog.c:5 At that point, I change the divisor variable to a nonzero value using set var b=1 and then try to "resume" from the same statement that got the exception. Using the command continue gives the following:...

If the program received an exception, this means that the program has finished (better said, has been finished by the kernel) so you cannot continue.

To do what you pretend, you need to intervene before the program crashes. So, put a break point just before the program has to make the division, so the debugger stops the program before it crashes, then change the divisor variable (with the set var... command), then let it continue, and observe that it doesn't crash.

set br 5
run
set var b = 5
cont

and now everything goes fine.

If your program ends without notifying the debugger, it is notified by the kernel anyway of program termination... but once the program has terminated, there's no chance to run it from the state it had (not even if you have a core to do post-mortem debugging) The core from an exception is not continuable, as the program has died and lost the control. The debugger doesn't know from where to start it running again, because it is in bad state.

But believe me, the best is to check why the divisor is zero and avoid this in the program itself and not to have to run the debugger each time you want to pass the division point to change the divisor so the program doesn't fail.

Upvotes: -1

Craig Estey
Craig Estey

Reputation: 33601

This approach can't work in the general case, particularly if optimization is on.

What you did just happened to work for your specific line/case. We can't tell too much because you didn't post the code.

The variable b starts in memory (either on the stack or in global memory).

At some point, it may end up in an FP register (e.g. for x86, xmm*).

So, when you do: set var b = 1 are you setting the memory location or the xmm register?

Even assuming only one statement per line, when you do jump you are restarting the statement (which can consist of multiple asm instructions).

What other things do we have to do to ensure that all other values (register, memory, etc.) are in the pre-statement/pre-instruction state?

With optimization, the value of b may have been cached in an xmm register by a prior statement, and the one you're doing the jump to may not reload it [from memory]

What you want to do is restart the instruction. At that point, the 2nd arg to the instruction may be an xmm register or a memory location. How can we tell which one?

Also, if you have (e.g.):

a = get_a_value();
// several more statements ...
a /= b;  // line 5

Then, we assume a is in (e.g.) xmm0 [and b is in xmm1]. The following is pseudo-asm:

fdiv %xmm0,%xmm1 // a /= b

What does the [pseudo] fdiv instruction do to the value of a in %xmm0? After the instruction is the value of a still intact or does %xmm0 now have a partial/garbage value? The value that is in %xmm0 after the failed instruction is architecture dependent. It the value is trashed, how do we determine the original/correct value to put back into %xmm0?

How many prior statements must be executed to restore the correct value of a?

You may have to disassemble the "offending" instruction and change the value of 2nd argument. If it's a memory location, change that. If it's an xmm* register, you'll have to change that.

And, you'll have to determine what has to be done to restore the a value (wherever it is residing).

The remediation that you want to do needs to be handled at the instruction level. Statement level is too coarse grained.

The reason that you can't find documentation on this is that what you're doing won't work too well in the general case and is contorting gdb.

gdb is designed so you can trap such an exception, determine what the bug in your program is, fix the source code, rebuild and rerun.

C [and/or machine code] isn't designed to allow such dynamic patch and restart from a prior point [in a debugger console] as you may have in an interpreted language such as Java/Perl/Python.

Upvotes: 1

Related Questions