Reputation: 8998
Apologies for the probably weird question title. I didn't want it to look like a dupe with a title like "How does C file I/O work at the low level?". I want it to be obvious that my question is specific.
Anyways, when a file is fopen
'd in C, it returns a struct _IO_FILE *
.
FILE *f = fopen("hello.txt", "r");
printf("Fileno: %i\n", f->_fileno); // 3
I've looked at libio.h
and gdb's "tab" output, and have confirmed that the contents of a struct _IO_FILE
are as follows:
struct _IO_FILE {
int _flags;
char* _IO_read_ptr;
char* _IO_read_end;
char* _IO_read_base; // <-- file contents
char* _IO_write_base;
char* _IO_write_ptr;
char* _IO_write_end;
char* _IO_buf_base;
char* _IO_buf_end;
char *_IO_save_base;
char *_IO_backup_base;
char *_IO_save_end;
struct _IO_marker *_markers;
struct _IO_FILE *_chain;
int _fileno;
int _flags2;
__off_t _old_offset;
unsigned short _cur_column;
signed char _vtable_offset;
char _shortbuf[1];
_IO_lock_t *_lock;
__off64_t _offset;
void *__pad1;
void *__pad2;
void *__pad3;
void *__pad4;
size_t __pad5;
int _mode;
char _unused2[...];
};
I've prodded at every one of them in gdb, and have noticed that f->_IO_read_base
is 0x0
at first, but becomes a pointer to a proper string, which contains the entire contents of the file, only after having called fgetc()
(or a similar function) at least once. After some gruelling and extensive searching of the glibc codebase, I seem to have tracked it down to a function called __uflow
So my question is, how does _IO_read_base
get initialized? Where does it get the contents from? How does it acquire said contents? When does IO_read_base
transform from a null pointer to a string? How would I go about doing this using only the struct itself and some system calls? I want to understand how this works at the low level.
...
(gdb) print fp->_IO_read_base
$3 = 0x0
(gdb) n
434 in genops.c
< a few more times ... >
_IO_getc (fp=0x602010) at getc.c:38
38 getc.c: No such file or directory.
(gdb) print fp->_IO_read_base
$4 = 0x7ffff7ff4000 "#include <stdio.h> ..."
(gdb)
You can see where it transforms. Somewhere in genops.c. Presumably __uflow()
. But its source doesn't answer any questions:
int
__uflow (fp)
_IO_FILE *fp;
{
#if defined _LIBC || defined _GLIBCPP_USE_WCHAR_T
if (_IO_vtable_offset (fp) == 0 && _IO_fwide (fp, -1) != -1)
return EOF;
#endif
if (fp->_mode == 0)
_IO_fwide (fp, -1);
if (_IO_in_put_mode (fp))
if (_IO_switch_to_get_mode (fp) == EOF)
return EOF;
if (fp->_IO_read_ptr < fp->_IO_read_end)
return *(unsigned char *) fp->_IO_read_ptr++;
if (_IO_in_backup (fp))
{
_IO_switch_to_main_get_area (fp);
if (fp->_IO_read_ptr < fp->_IO_read_end)
return *(unsigned char *) fp->_IO_read_ptr++;
}
if (_IO_have_markers (fp))
{
if (save_for_backup (fp, fp->_IO_read_end))
return EOF;
}
else if (_IO_have_backup (fp))
_IO_free_backup_area (fp);
return _IO_UFLOW (fp);
}
libc_hidden_def (__uflow)
Testing each call in gdb, every single if check fails, so I'm left to assume that it returns _IO_UFLOW (fp);
. The funny thing is that _IO_UFLOW is a macro wrapper of __uflow
, so...it's calling itself. And it's not recursing infinitely. Why?
And with that, I've hit a dead end, as there is still no explanation that I can find as to how fp->IO_read_ptr
gets filled out. All I know is that it happens "somewhere" in genops.c
.
Upvotes: 2
Views: 1990
Reputation: 213754
On platforms with hardware watchpoint support in GDB, you can answer this question trivially by setting a watchpoint on fp->_IO_read_base
. Example:
(gdb) watch -l fp->_IO_read_base
Hardware watchpoint 2: -location fp->_IO_read_base
(gdb) c
Continuing.
Hardware watchpoint 2: -location fp->_IO_read_base
Old value = 0x0
New value = 0x7ffff7ff7000 ""
__GI__IO_switch_to_get_mode (fp=fp@entry=0x602010) at genops.c:191
191 genops.c: No such file or directory.
(gdb) bt
#0 __GI__IO_switch_to_get_mode (fp=fp@entry=0x602010) at genops.c:191
#1 0x00007ffff7a8f670 in _IO_new_file_underflow (fp=0x602010) at fileops.c:602
#2 0x00007ffff7a841a5 in _IO_getdelim (lineptr=0x7fffffffdc88, n=0x7fffffffdc90, delimiter=10, fp=0x602010) at iogetdelim.c:77
#3 0x00000000004005b7 in main () at t.c:9
Upvotes: 1