Reputation: 335
Some days ago I encountered an odd bug in my code where apparently the variable pi
(see below) had the value 0, as the line lat += 0.5 * pi
did nothing, but lat += 1
worked as expected. The bug vanished after changing the compiler version (from GCC 5.3.1 to GCC 6.2.1 on CentOS 6), but what intrigued me the most was that when using gdb, when arriving at a set breakpoint at that position, print pi
output the expected 3.141
, though still lat += 0.5 * pi
did not change the value of lat
in any way. Internally, somehow, the value for pi
within gdb did not match the value within the program.
It is deeply unsatisfying that the bug vanishes when changing my toolchain and I don't know where the error came from (probably I did something wrong or at least in a bad way). However, under what circumstances can't I trust the values given to me in gdb? Can the external linkage of pi
be a trap for debugging purposes?
// Constant.h
#pragma once
namespace abc {
const double pi = 3.141;
}
// Code.h
#pragma once
namespace abc {
extern const double pi;
int lat_func(double& lat);
}
// Code.cpp
#include "Constant.h"
#include "Code.h"
#include <iostream>
namespace abc {
int lat_func(double& lat) {
std::cout << "pi: " << pi << std::endl; // `pi: 0`
// When setting a breakpoint here, `p pi` returns `3.141` in gdb
lat += 0.5 * pi; // lat is unchanged
lat += 1; // 1 is added to lat
return 0;
}
}
// main.cpp
#include "Code.h"
#include <iostream>
int main()
{
double lat = 0.5;
abc::lat_func(lat);
std::cout << lat << std::endl;
}
Code.cpp is part of a static library that main.cpp is linked against.
Edit: The Makefile used is
INCLUDES = -I.
LIBS = -L. -lcode
CC = g++
CCFLAGS = -g -O0 -Wall -Wextra -pedantic -Wno-write-strings -Wno-unknown-pragmas -Wall -Wextra -pedantic -std=c++11
all: libcode.a main
main: main.o
$(CC) -o $@ $^ $(LIBS)
libcode.a: Code.o
ar cr $@ $^
%.o: %.cpp
$(CC) $(CCFLAGS) $(INCLUDES) -fPIC -c -o $@ $<
Edit: As suggested in the comments, this is the disassembled code of the line lat += 0.5 * pi
. The difference between both of them (as far as I can see) is that in GCC 5.3.1, pxor %xmm2,%xmm2
is called at one point and in GCC 6.2.1, this changes to movsd 0xcfee57d(%rip),%xmm2
.
GGC 5.3.1
lat += 0.5 * pi;
=> 0x0000000000f1c22c <+312>: mov -0x20(%rbp),%rax
0x0000000000f1c230 <+316>: movsd (%rax),%xmm1
0x0000000000f1c234 <+320>: pxor %xmm2,%xmm2
0x0000000000f1c238 <+324>: movsd 0x1269a0(%rip),%xmm0 # 0x1042be0
0x0000000000f1c240 <+332>: mulsd %xmm2,%xmm0
0x0000000000f1c244 <+336>: addsd %xmm1,%xmm0
0x0000000000f1c248 <+340>: mov -0x20(%rbp),%rax
0x0000000000f1c24c <+344>: movsd %xmm0,(%rax)
GCC 6.2.1
lat += 0.5 * pi;
=> 0x0000000000f34ca3 <+327>: mov -0x20(%rbp),%rax
0x0000000000f34ca7 <+331>: movsd (%rax),%xmm1
0x0000000000f34cab <+335>: movsd 0xcfee57d(%rip),%xmm2 # 0xdf23230 <_ZN3abc2piE>
0x0000000000f34cb3 <+343>: movsd 0x127735(%rip),%xmm0 # 0x105c3f0
0x0000000000f34cbb <+351>: mulsd %xmm2,%xmm0
0x0000000000f34cbf <+355>: addsd %xmm1,%xmm0
0x0000000000f34cc3 <+359>: mov -0x20(%rbp),%rax
0x0000000000f34cc7 <+363>: movsd %xmm0,(%rax)
Upvotes: 2
Views: 1233
Reputation: 33747
There can certainly be bugs in GDB and in GCC (which must provide the correct DWARF information, otherwise GDB will not work).
How did you determine that the lat += 0.5 * pi;
statement executed? What often happens (at least in optimized code) is that you step over a line like this, but GDB only executes its partially (say, just the multiplication, or the computation of the address), and eventually returns to it after a few more steps. Then the addition might be performed, and after that, the store.
Complex statements like this one result in multiple machine instructions, and GCC schedules them independently, interleaving them with other instructions. With the way GCC currently generates debugging information for optimized code, it is not possible to represent that in a way that debugging with GDB is straightforward.
Arguably, this is a GCC/GDB bug, but it's not easy to fix. A recent blog post, Statement Frontier Notes and Location Views, explains how the GNU toolchain folks plan to tackle this.
EDIT After thinking about this some more, the most likely explanation is that this is simply a GCC bug, where it simply does not process the pi
variable correctly. GDB looks at the variable in memory, and finds the expected value stored there, but for some reason, the machine code GCC generated does not load it correctly. If this is a compiler bug, then it is useless to speculate about GDB behavior because anything can happen with a compiler bug (and when confronted with such bugs, the only thing you can do to debug it is look at the generated machine code and dump files containing intermediate compiler information).
Upvotes: 2