Reputation: 5631
I need to run a rather large software package on a new machine for work. The application is written in C and C++ and I am running on CentOS 6.5.
The program builds fine, but segfaults when I go to run it. Using valgrind, I see the following error reported at the location of the segfault:
==23843== Invalid read of size 4
[stack trace here]
==23843== Address 0x642e7464 is not stack'd, malloc'd or (recently) free'd
So for some reason we are reading from memory we aren't supposed to and are invoking undefined behaviour. When I tar up my source files, take them to another CentOS 6.5 machine (w/ same kernel) and compile them (with same makefiles and same GCC version) the program seems to run fine.
I ran valgrind on that machine as well and expected to see the invalid read again. My thought was that the invalid read would always be present, yet because the behaviour is undefined things just happened to work correctly on one machine and not on the other.
What I found, however, was that valgrind reports no read errors on the second machine. How could this be possible?
Upvotes: 3
Views: 424
Reputation: 2393
This could be caused by using different versions of Valgrind.
Some common false positive errors get removed in newer versions. Which would explain why one machine complains about it (older version) and another one doesn't (newer version).
Upvotes: 0
Reputation: 188
Different library versions are the best guess, judging from the sparse information you gave. Things to try:
1) Bring both machines up to date via package manager and try again
2) Run ldd [binary]
to see all libraries used by the program in question. Run something like md5sum
on them on both machines to find out if there are differences.
In general I made the experience that valgrind is really bad at detecting invalid memory access on the stack, so this might be a hidden root cause. If all else fails, you might want to try using clang and address sanitizer. It might find things valgrind doesn't catch, and vice versa.
Upvotes: 0
Reputation: 4901
Valgrind makes the running environment more deterministic, but it does not eliminate all randomness. Maybe the other machine has bit different versions of libraries installed, or anything external it is using (files, network..) is different, the code execution does not have to be exactly the same.
You should look at the stack trace and analyze the code where the error happens. If it is not obvious from the stack trace alone, you can start valgrind
with --vgdb=full
parameter. It will pause the execution once the error happens and print out instructions how to attach gdb
. Or you can just run the program under debugger directly - you wrote that it crashes even without valgrind.
Upvotes: 3