Reputation: 41002
Assume main.c
uses symbols from shared libs and local functions declared in main.c
.
Is there a nice and elegant way to print a list of all the available function names and symbols at run time?
It should be possible since the data is loaded to the .code
segment.
Upvotes: 29
Views: 9481
Reputation: 2654
I updated the code from Kanalpiroge's answer so it also works in case when DT_HASH is missing (for example, RHEL). It is for 64 bit, but it is relatively easy to modify it to support 32 bit as well. The inspiration came from here: https://chromium-review.googlesource.com/c/crashpad/crashpad/+/876879/18/snapshot/elf/elf_image_reader.cc#b512.
#include <link.h>
#include <string>
#include <vector>
using namespace std;
static uint32_t GetNumberOfSymbolsFromGnuHash(Elf64_Addr gnuHashAddress)
{
// See https://flapenguin.me/2017/05/10/elf-lookup-dt-gnu-hash/ and
// https://sourceware.org/ml/binutils/2006-10/msg00377.html
typedef struct
{
uint32_t nbuckets;
uint32_t symoffset;
uint32_t bloom_size;
uint32_t bloom_shift;
} Header;
Header* header = (Header*)gnuHashAddress;
const void* bucketsAddress = (uint8_t*)gnuHashAddress + sizeof(Header) + (sizeof(uint64_t) * header->bloom_size);
// Locate the chain that handles the largest index bucket.
uint32_t lastSymbol = 0;
uint32_t* bucketAddress = (uint32_t*)bucketsAddress;
for (uint32_t i = 0; i < header->nbuckets; ++i)
{
uint32_t bucket = *bucketAddress;
if (lastSymbol < bucket)
{
lastSymbol = bucket;
}
bucketAddress++;
}
if (lastSymbol < header->symoffset)
{
return header->symoffset;
}
// Walk the bucket's chain to add the chain length to the total.
const void* chainBaseAddress = (uint8_t*)bucketsAddress + (sizeof(uint32_t) * header->nbuckets);
for (;;)
{
uint32_t* chainEntry = (uint32_t*)((uint8_t*)chainBaseAddress + (lastSymbol - header->symoffset) * sizeof(uint32_t));
lastSymbol++;
// If the low bit is set, this entry is the end of the chain.
if (*chainEntry & 1)
{
break;
}
}
return lastSymbol;
}
/* Callback for dl_iterate_phdr.
* Is called by dl_iterate_phdr for every loaded shared lib until something
* else than 0 is returned by one call of this function.
*/
int retrieve_symbolnames(struct dl_phdr_info* info, size_t info_size, void* symbol_names_vector)
{
/* ElfW is a macro that creates proper typenames for the used system architecture
* (e.g. on a 32 bit system, ElfW(Dyn*) becomes "Elf32_Dyn*") */
ElfW(Dyn*) dyn;
ElfW(Sym*) sym;
ElfW(Word*) hash;
char* strtab = 0;
char* sym_name = 0;
ElfW(Word) sym_cnt = 0;
/* the void pointer (3rd argument) should be a pointer to a vector<string>
* in this example -> cast it to make it usable */
vector<string>* symbol_names = reinterpret_cast<vector<string>*>(symbol_names_vector);
/* Iterate over all headers of the current shared lib
* (first call is for the executable itself) */
for (size_t header_index = 0; header_index < info->dlpi_phnum; header_index++)
{
/* Further processing is only needed if the dynamic section is reached */
if (info->dlpi_phdr[header_index].p_type == PT_DYNAMIC)
{
/* Get a pointer to the first entry of the dynamic section.
* It's address is the shared lib's address + the virtual address */
dyn = (ElfW(Dyn)*)(info->dlpi_addr + info->dlpi_phdr[header_index].p_vaddr);
/* Iterate over all entries of the dynamic section until the
* end of the symbol table is reached. This is indicated by
* an entry with d_tag == DT_NULL.
*
* Only the following entries need to be processed to find the
* symbol names:
* - DT_HASH -> second word of the hash is the number of symbols
* - DT_STRTAB -> pointer to the beginning of a string table that
* contains the symbol names
* - DT_SYMTAB -> pointer to the beginning of the symbols table
*/
while (dyn->d_tag != DT_NULL)
{
if (dyn->d_tag == DT_HASH)
{
/* Get a pointer to the hash */
hash = (ElfW(Word*))dyn->d_un.d_ptr;
/* The 2nd word is the number of symbols */
sym_cnt = hash[1];
}
else if (dyn->d_tag == DT_GNU_HASH && sym_cnt == 0)
{
sym_cnt = GetNumberOfSymbolsFromGnuHash(dyn->d_un.d_ptr);
}
else if (dyn->d_tag == DT_STRTAB)
{
/* Get the pointer to the string table */
strtab = (char*)dyn->d_un.d_ptr;
}
else if (dyn->d_tag == DT_SYMTAB)
{
/* Get the pointer to the first entry of the symbol table */
sym = (ElfW(Sym*))dyn->d_un.d_ptr;
/* Iterate over the symbol table */
for (ElfW(Word) sym_index = 0; sym_index < sym_cnt; sym_index++)
{
/* get the name of the i-th symbol.
* This is located at the address of st_name
* relative to the beginning of the string table. */
sym_name = &strtab[sym[sym_index].st_name];
symbol_names->push_back(string(sym_name));
}
}
/* move pointer to the next entry */
dyn++;
}
}
}
/* Returning something != 0 stops further iterations,
* since only the first entry, which is the executable itself, is needed
* 1 is returned after processing the first entry.
*
* If the symbols of all loaded dynamic libs shall be found,
* the return value has to be changed to 0.
*/
return 1;
}
int main()
{
vector<string> symbolNames;
dl_iterate_phdr(retrieve_symbolnames, &symbolNames);
return 0;
}
Upvotes: 6
Reputation: 536
Since I had the same need to retrieve all loaded symbol names at runtime, I did some research based upon R..'s answer. So here is a detailed solution for linux shared libraries in ELF format which works with my gcc 4.3.4, but hopefully also with newer versions.
I mostly used the following sources to develop this solution:
And here's my code. I used self explaining variable names and added detailed comments to make it understandable. If something is wrong or missing, please let me know... (Edit: I just realized that the question was for C and my code is for C++. But if you leave out the vector and the string it should work for C as well)
#include <link.h>
#include <string>
#include <vector>
using namespace std;
/* Callback for dl_iterate_phdr.
* Is called by dl_iterate_phdr for every loaded shared lib until something
* else than 0 is returned by one call of this function.
*/
int retrieve_symbolnames(struct dl_phdr_info* info, size_t info_size, void* symbol_names_vector)
{
/* ElfW is a macro that creates proper typenames for the used system architecture
* (e.g. on a 32 bit system, ElfW(Dyn*) becomes "Elf32_Dyn*") */
ElfW(Dyn*) dyn;
ElfW(Sym*) sym;
ElfW(Word*) hash;
char* strtab = 0;
char* sym_name = 0;
ElfW(Word) sym_cnt = 0;
/* the void pointer (3rd argument) should be a pointer to a vector<string>
* in this example -> cast it to make it usable */
vector<string>* symbol_names = reinterpret_cast<vector<string>*>(symbol_names_vector);
/* Iterate over all headers of the current shared lib
* (first call is for the executable itself) */
for (size_t header_index = 0; header_index < info->dlpi_phnum; header_index++)
{
/* Further processing is only needed if the dynamic section is reached */
if (info->dlpi_phdr[header_index].p_type == PT_DYNAMIC)
{
/* Get a pointer to the first entry of the dynamic section.
* It's address is the shared lib's address + the virtual address */
dyn = (ElfW(Dyn)*)(info->dlpi_addr + info->dlpi_phdr[header_index].p_vaddr);
/* Iterate over all entries of the dynamic section until the
* end of the symbol table is reached. This is indicated by
* an entry with d_tag == DT_NULL.
*
* Only the following entries need to be processed to find the
* symbol names:
* - DT_HASH -> second word of the hash is the number of symbols
* - DT_STRTAB -> pointer to the beginning of a string table that
* contains the symbol names
* - DT_SYMTAB -> pointer to the beginning of the symbols table
*/
while(dyn->d_tag != DT_NULL)
{
if (dyn->d_tag == DT_HASH)
{
/* Get a pointer to the hash */
hash = (ElfW(Word*))dyn->d_un.d_ptr;
/* The 2nd word is the number of symbols */
sym_cnt = hash[1];
}
else if (dyn->d_tag == DT_STRTAB)
{
/* Get the pointer to the string table */
strtab = (char*)dyn->d_un.d_ptr;
}
else if (dyn->d_tag == DT_SYMTAB)
{
/* Get the pointer to the first entry of the symbol table */
sym = (ElfW(Sym*))dyn->d_un.d_ptr;
/* Iterate over the symbol table */
for (ElfW(Word) sym_index = 0; sym_index < sym_cnt; sym_index++)
{
/* get the name of the i-th symbol.
* This is located at the address of st_name
* relative to the beginning of the string table. */
sym_name = &strtab[sym[sym_index].st_name];
symbol_names->push_back(string(sym_name));
}
}
/* move pointer to the next entry */
dyn++;
}
}
}
/* Returning something != 0 stops further iterations,
* since only the first entry, which is the executable itself, is needed
* 1 is returned after processing the first entry.
*
* If the symbols of all loaded dynamic libs shall be found,
* the return value has to be changed to 0.
*/
return 1;
}
int main()
{
vector<string> symbolNames;
dl_iterate_phdr(retrieve_symbolnames, &symbolNames);
return 0;
}
Upvotes: 33
Reputation: 62848
This is not really C specific, but operating system and binary format and (for debugging symbols and unmangled C++ symbol names) even compiler specific question. There is no generic way, and also no truly elegant way.
The most portable and future-proof way is probably running external program such as nm
, which is in POSIX. GNU version found in Linuxes probably has a bunch of extensions, which you should avoid if you aim for portability and future-proofness.
Its output should stay stable, and even if binary formats change, it will also get updated and keep working. Just run it with right switches, capture its output (probably by running it through popen
to avoid a temp file) and parse that.
Upvotes: 6
Reputation: 215387
On dynamic-linked ELF-based systems, you may have a function dl_iterate_phdr
available. If so, it can be used to gather information on each loaded shared library file, and the information you get is sufficient to examine the symbol tables. The process is basically:
dl_phdr_info
structure passed back to you.PT_DYNAMIC
program header to find the _DYNAMIC
table for the module.DT_SYMTAB
, DT_STRTAB
, and DT_HASH
entries of _DYNAMIC
to find the list of symbols. DT_HASH
is only needed to get the length of the symbol table, since it doesn't seem to be stored anywhere else.The types you need should all be in <elf.h>
and <link.h>
.
Upvotes: 13