Reputation:
From C we know what legal variable names are. The general regex for the legal names looks similar to [\w_](\w\d_)*
.
Using dlsym
we can load arbitrary strings, and C++ mangles names that include @ in the ABI..
My question is: can arbitrary strings be used? The documentation on dlsym does not seem to mention anything.
Another question that came up appears to imply that it is fully possible to have arbitrary null-terminated symbols. This inquires me to ask the following question:
Why doesn't g++ emit raw function signatures, with name and parameter list, including namespace and class membership?
Here's what I mean:
namespace test {
class A
{
int myFunction(const int a);
};
}
namespace test {
int A::myFunction(const int a){return a * 2;}
}
Does not get compiled to
int ::test::A::myFunction(const int a)\0
Instead, it gets compiled to - on my 64 bit machine, using g++ 4.9.2 -
0000000000000000 T _ZN4test1A10myFunctionEi
This output is read by nm
. The code was compiled using g++ -c test.cpp -o out
Upvotes: 7
Views: 272
Reputation: 385144
(In this answer I ignore that you made several typos in your example of ::test::A::void myFunction(const int a)
).
This format is:
int ::test::A::myFunction(const int)
int ::test::A::myFunction(int const)
int test::A::myFunction(int const)
int test :: A :: myFunction (int const)
Meanwhile, I see no benefit at all in choosing a human-readable looks-like-C++ format for a C++ ABI. This stuff is supposed to be optimised for machines. Why would you make it less optimal for machines, in order to make it more optimal for humans? And probably failing at the latter whilst doing so.
You say that your compiler does not emit "raw symbols". I posit that it does precisely that.
Upvotes: 1
Reputation: 70263
You basically answered your own question:
The general regex for the legal names looks similar to
[\w_](\w\d_)*
.
From the beginning, C++ used preexisting (C) linker / loader technology. There is nothing "C++" about either ld
, ld-linux.so
etc.
So linking is limited to what was legal in C already. That does not include colons, parenthesis, ampersands, asteriskes, and whatever else you would need to encode C++ identifiers in plain text.
Upvotes: 1
Reputation: 35998
Upvotes: 1
Reputation: 96241
I'm sure this decision was pragmatically made to avoid having to make any changes to pre-existing C linkers (quite possibly even originated from cfront). By emitting symbols with the same set of characters the C linker is used to you don't have to possibly make any number of updates and can use the linker off the shelf.
Additionally C and C++ are widely portable languages and they wouldn't want to risk breaking a more obscure binary format (perhaps on an embedded system) by including unexpected symbols.
Finally since you can always demangle (with something like gc++filt
for example) it probably didn't seem worth using a full text representation.
P.S. You would absolutely not want to include the parameter name in the function name: People will not be happy if renaming a parameter breaks ABI. It's hard enough to keep ABI compatibility already.
Upvotes: 5
Reputation: 25409
GCC is compliant with the Itanium C++ ABI. If your question is “Why does the Itanium C++ ABI require names to be mangled that way?” then the answer is probably
For the second point, there is a pretty good explanation in Ulrich Drepper's article How To Write Shared Libraries.
Upvotes: 1