Marc Mutz - mmutz
Marc Mutz - mmutz

Reputation: 25333

Inconsistent string literal sharing with GCC

I've trouble understanding why in this code:

#include <cstdio>

void use(const char *msg) { printf("%s\n", msg); }

void bar() { use("/usr/lib/usr/local/foo-bar"); }
void foo() { use("/usr/local/foo-bar"); }

int main() {
    bar();
    foo();
}

The compiler (GCC 4.9, in my case) decides to share the string literals:

$ g++ -O2 -std=c++11 foo.cpp && strings a.out | grep /usr/
/usr/lib/usr/local/foo-bar

Yet in the same, but different situation:

#include <cstdio>

void use(const char *msg) { printf("%s\n", msg); }

void bar() { use("/usr/local/var/lib/dbus/machine-id"); } // CHANGED
void foo() { use("/var/lib/dbus/machine-id"); }           // CHANGED

int main() {
    bar();
    foo();
}

it doesn't:

$ g++ -O2 -std=c++11 foo.cpp && strings a.out | grep /lib/
/usr/local/var/lib/dbus/machine-id
/var/lib/dbus/machine-id

EDIT:

With -Os the second pair of strings are also shared. But that makes no sense. It's just passing pointers. The lea with constant offset can hardly be considered worsening performance in such a way as to allow the sharing only in space-optimised mode.

There seems to be a size limit (of 30, incl. the terminating NUL) for string literal sharing. That, too, makes little sense except for maybe avoiding overly long linker runs, trying to find common suffixes.

Upvotes: 2

Views: 351

Answers (1)

Rudolfs Bundulis
Rudolfs Bundulis

Reputation: 11954

This paper has a nice study of gcc and this topic. I personally was not aware of -fmerge-all-constants, but you can check if that makes the string overlap in both cases (as the paper states it does not work with O3 and Os).

EDIT

Since there was a valid comment, that the answer is link-only (and I meant the answer to be more of just a related information than an actual answer), I felt I needed to make this more extensive. So I tried both samples in http://gcc.godbolt.org/ to see what assembly is generated since I don't have a Linux machine accessible. Strangely enough gcc 4.9 does not merge the strings (or my assembly knowledge is totally wrong), so the question is - can it be specific to your toolchain or maybe the parsing tools fails? See the below images:

enter image description here enter image description here

Of course if I my understanding of the assembly is wrong and .LC1 and .LC3 can still overlap in the .rodata section then this does not prove anything, but then at least someone will correct me and I'll be aware of this.

Upvotes: 1

Related Questions