user1764386
user1764386

Reputation: 5631

Valgrind reporting invalid read on one system but not another

I need to run a rather large software package on a new machine for work. The application is written in C and C++ and I am running on CentOS 6.5.

The program builds fine, but segfaults when I go to run it. Using valgrind, I see the following error reported at the location of the segfault:

==23843== Invalid read of size 4
[stack trace here]
==23843==  Address 0x642e7464 is not stack'd, malloc'd or (recently) free'd

So for some reason we are reading from memory we aren't supposed to and are invoking undefined behaviour. When I tar up my source files, take them to another CentOS 6.5 machine (w/ same kernel) and compile them (with same makefiles and same GCC version) the program seems to run fine.

I ran valgrind on that machine as well and expected to see the invalid read again. My thought was that the invalid read would always be present, yet because the behaviour is undefined things just happened to work correctly on one machine and not on the other.

What I found, however, was that valgrind reports no read errors on the second machine. How could this be possible?

Upvotes: 3

Views: 424

Answers (3)

Deanie
Deanie

Reputation: 2393

This could be caused by using different versions of Valgrind.

Some common false positive errors get removed in newer versions. Which would explain why one machine complains about it (older version) and another one doesn't (newer version).

Upvotes: 0

Simon
Simon

Reputation: 188

Different library versions are the best guess, judging from the sparse information you gave. Things to try:

1) Bring both machines up to date via package manager and try again

2) Run ldd [binary] to see all libraries used by the program in question. Run something like md5sum on them on both machines to find out if there are differences.

In general I made the experience that valgrind is really bad at detecting invalid memory access on the stack, so this might be a hidden root cause. If all else fails, you might want to try using clang and address sanitizer. It might find things valgrind doesn't catch, and vice versa.

Upvotes: 0

michalsrb
michalsrb

Reputation: 4901

Valgrind makes the running environment more deterministic, but it does not eliminate all randomness. Maybe the other machine has bit different versions of libraries installed, or anything external it is using (files, network..) is different, the code execution does not have to be exactly the same.

You should look at the stack trace and analyze the code where the error happens. If it is not obvious from the stack trace alone, you can start valgrind with --vgdb=full parameter. It will pause the execution once the error happens and print out instructions how to attach gdb. Or you can just run the program under debugger directly - you wrote that it crashes even without valgrind.

Upvotes: 3

Related Questions