Reputation: 7360
I am losing the address of a (global) variable (defined in a shared object) when a function call is made. To prove it, I intentionally set the value of the variable num_accounts to 55, when the function get_num_accounts() began the execution, the pointer to this variable was received incorrectly. My proof is this gdb session:
accounts_init () at accounts.c:31
31 err_code=read_accounts();
(gdb) print num_accounts
$1 = 0
(gdb) print &num_accounts
$2 = (account_idx_t *) 0x7ffff767c640 <num_accounts>
(gdb) set var num_accounts=55
(gdb) print num_accounts
$3 = 55
(gdb) s
read_accounts () at accounts.c:66
66 err_code=get_num_accounts(&num_accounts);
(gdb) s
get_num_accounts (num_accounts_ptr=0x604780 <num_accounts>) at accounts.c:119
119 *num_accounts_ptr=0;
(gdb) print num_accounts
$4 = 55
(gdb) print *num_accounts_ptr
$5 = 0
(gdb)
(gdb) print num_accounts_ptr
$6 = (account_idx_t *) 0x604780 <num_accounts>
(gdb)
The address of the variable num_accounts is 0x7ffff767c640 , but I get 0x604780 when the function is executed, why such a weird stuff happens ?
The source code of the function get_num_accounts() is this:
err_code_t get_num_accounts(account_idx_t *num_accounts_ptr) {
err_code_t err_code;
uint32_t file_size;
div_t div_result;
unsigned short number;
*num_accounts_ptr=0;
err_code=get_dir();
if (err_code!=ERR_NO_ERROR) return err_code;
err_code=get_file(ACCOUNTS_FILENAME,sizeof(ACCOUNTS_FILENAME),&file_size);
if (err_code!=ERR_NO_ERROR) return err_code;
div_result=div(file_size,sizeof(tbl_account_t));
if (div_result.rem!=0) {
return ERR_BAD_CONFIG_FILE_FORMAT;
}
number=div_result.quot;
*num_accounts_ptr=number;
return ERR_NO_ERROR;
}
Type account_idx_t is defined as:
typedef unsigned short account_idx_t;
The global variable num_accounts is defined in accounts.c file at the beginning:
account_idx_t num_accounts=0;
Basically, what the function does, is to get the size of the file and calculate the number of records that the file contains, before reading it. (its a database)
And this is the calling code, which calles get_num_accounts() function:
err_code_t accounts_init(void) {
err_code_t err_code;
err_code=read_accounts();
if (err_code!=ERR_NO_ERROR) return err_code;
return ERR_NO_ERROR;
}
err_code_t read_accounts(void) {
err_code_t err_code;
int ret;
err_code=get_num_accounts(&num_accounts);
if (err_code!=ERR_NO_ERROR) return err_code;
if (num_accounts==0) return ERR_NO_ERROR;
int fd=open(filename_buf,O_RDONLY); // filename_buf is global, it holds filename from previous call
if (fd==-1) {
return ERR_SYS_ERROR;
}
ret=read(fd,accounts,sizeof(tbl_account_t)*num_accounts);
if (ret==-1) {
return ERR_SYS_ERROR;
}
ret=close(fd); // TO_DO: validate return value of close(fd)
if (ret==-1) {
return ERR_SYS_ERROR;
}
return ERR_NO_ERROR;
}
I am compiling the library with -fPIC flag:
[niko@dev1 src]$ make accounts.o
gcc -g -ffunction-sections -fdata-sections -Wall -Wextra -Wunreachable-code -Wmissing-prototypes -Wmissing-declarations -Wunused -Winline -Wstrict-prototypes -Wimplicit-function-declaration -Wformat -D_GNU_SOURCE -fshort-enums -fPIC -c accounts.c
There is no another 'num_accounts' symbol anywhere in the source code, I double checked that:
[niko@dev1 src]$ nm *o|grep num_accounts
0000000000000000 T get_num_accounts
0000000000000000 B num_accounts
[niko@dev1 src]$
Any suggestion on further debugging steps?
Upvotes: 2
Views: 632
Reputation: 126253
You have two distinct symbols called num_accounts
in the executable image that gdb is looking at. gdb tells you that directly, as whenever you tell it to print something that has a pointer-type value, gdb will do a reverse-lookup on that address in the symbol table of the executable and, if it finds something, will print the name of the symbol in <>. So when you do the gdb command:
(gdb) print &num_accounts
$2 = (account_idx_t *) 0x7ffff767c640 <num_accounts>
gdb is telling you that 0x7ffff767c640
points at a symbol called num_accounts
. Similarly,
(gdb) print num_accounts_ptr
$6 = (account_idx_t *) 0x604780 <num_accounts>
tells you 0x604780
points at a symbol that is also called num_accounts
.
Now the question is how you got two symbols with the same name. The large address 0x7ffff767c640
is going to be in a shared library, as shared libraries load at addresses like that. The small address 0x604780
is in the base executable.
Upvotes: 1