Nulik
Nulik

Reputation: 7360

getting incorrect address of a variable during a function call

I am losing the address of a (global) variable (defined in a shared object) when a function call is made. To prove it, I intentionally set the value of the variable num_accounts to 55, when the function get_num_accounts() began the execution, the pointer to this variable was received incorrectly. My proof is this gdb session:

accounts_init () at accounts.c:31
31      err_code=read_accounts();
(gdb) print num_accounts
$1 = 0
(gdb) print &num_accounts
$2 = (account_idx_t *) 0x7ffff767c640 <num_accounts>
(gdb) set var num_accounts=55
(gdb) print num_accounts
$3 = 55
(gdb) s
read_accounts () at accounts.c:66
66      err_code=get_num_accounts(&num_accounts);
(gdb) s
get_num_accounts (num_accounts_ptr=0x604780 <num_accounts>) at accounts.c:119
119     *num_accounts_ptr=0;
(gdb) print num_accounts
$4 = 55
(gdb) print *num_accounts_ptr
$5 = 0
(gdb) 
(gdb) print num_accounts_ptr
$6 = (account_idx_t *) 0x604780 <num_accounts>
(gdb) 

The address of the variable num_accounts is 0x7ffff767c640 , but I get 0x604780 when the function is executed, why such a weird stuff happens ?

The source code of the function get_num_accounts() is this:

err_code_t get_num_accounts(account_idx_t *num_accounts_ptr) {
    err_code_t err_code;
    uint32_t file_size;
    div_t div_result;
    unsigned short number;

    *num_accounts_ptr=0;
    err_code=get_dir();
    if (err_code!=ERR_NO_ERROR) return err_code;
    err_code=get_file(ACCOUNTS_FILENAME,sizeof(ACCOUNTS_FILENAME),&file_size);
    if (err_code!=ERR_NO_ERROR) return err_code;

    div_result=div(file_size,sizeof(tbl_account_t));
    if (div_result.rem!=0) {
        return ERR_BAD_CONFIG_FILE_FORMAT;
    }
    number=div_result.quot;
    *num_accounts_ptr=number;
    return ERR_NO_ERROR;
}

Type account_idx_t is defined as:

typedef         unsigned short          account_idx_t;

The global variable num_accounts is defined in accounts.c file at the beginning:

account_idx_t       num_accounts=0;

Basically, what the function does, is to get the size of the file and calculate the number of records that the file contains, before reading it. (its a database)

And this is the calling code, which calles get_num_accounts() function:

err_code_t accounts_init(void) {
    err_code_t err_code;

    err_code=read_accounts();
    if (err_code!=ERR_NO_ERROR) return err_code;

    return ERR_NO_ERROR;
}
err_code_t read_accounts(void) {
    err_code_t err_code;
    int ret;

    err_code=get_num_accounts(&num_accounts);
    if (err_code!=ERR_NO_ERROR) return err_code;
    if (num_accounts==0) return ERR_NO_ERROR;

    int fd=open(filename_buf,O_RDONLY); // filename_buf is global, it holds filename from previous call
    if (fd==-1) {
        return ERR_SYS_ERROR;
    }
    ret=read(fd,accounts,sizeof(tbl_account_t)*num_accounts);
    if (ret==-1) {
        return ERR_SYS_ERROR;
    }
    ret=close(fd);  // TO_DO: validate return value of close(fd)
    if (ret==-1) {
        return ERR_SYS_ERROR;
    }
    return ERR_NO_ERROR;
}

I am compiling the library with -fPIC flag:

[niko@dev1 src]$ make accounts.o
gcc -g -ffunction-sections -fdata-sections -Wall -Wextra -Wunreachable-code -Wmissing-prototypes -Wmissing-declarations -Wunused -Winline -Wstrict-prototypes -Wimplicit-function-declaration -Wformat -D_GNU_SOURCE -fshort-enums -fPIC -c accounts.c

There is no another 'num_accounts' symbol anywhere in the source code, I double checked that:

[niko@dev1 src]$ nm *o|grep num_accounts 
0000000000000000 T get_num_accounts
0000000000000000 B num_accounts
[niko@dev1 src]$ 

Any suggestion on further debugging steps?

Upvotes: 2

Views: 632

Answers (1)

Chris Dodd
Chris Dodd

Reputation: 126253

You have two distinct symbols called num_accounts in the executable image that gdb is looking at. gdb tells you that directly, as whenever you tell it to print something that has a pointer-type value, gdb will do a reverse-lookup on that address in the symbol table of the executable and, if it finds something, will print the name of the symbol in <>. So when you do the gdb command:

(gdb) print &num_accounts
$2 = (account_idx_t *) 0x7ffff767c640 <num_accounts>

gdb is telling you that 0x7ffff767c640 points at a symbol called num_accounts. Similarly,

(gdb) print num_accounts_ptr
$6 = (account_idx_t *) 0x604780 <num_accounts>

tells you 0x604780 points at a symbol that is also called num_accounts.

Now the question is how you got two symbols with the same name. The large address 0x7ffff767c640 is going to be in a shared library, as shared libraries load at addresses like that. The small address 0x604780 is in the base executable.

Upvotes: 1

Related Questions