Reputation: 34197
Disassembling printf
doesn't give much info:
(gdb) disas printf
Dump of assembler code for function printf:
0x00401b38 <printf+0>: jmp *0x405130
0x00401b3e <printf+6>: nop
0x00401b3f <printf+7>: nop
End of assembler dump.
(gdb) disas 0x405130
Dump of assembler code for function _imp__printf:
0x00405130 <_imp__printf+0>: je 0x405184 <_imp__vfprintf+76>
0x00405132 <_imp__printf+2>: add %al,(%eax)
How is it implemented under the hood?
Why disassembling doesn't help?
What does *
mean before 0x405130
?
Upvotes: 3
Views: 1938
Reputation: 340208
As for
What does * mean before 0x405130?
I'm not familiar with gdb's disassembler, but it looks like the jmp *0x405130
is an indirect jump through a pointer. Instead of disassembling what's at 0x405130 you should dump the 4 bytes of memory there. I'd be willing to bet that you'll find another address there, and if you disassemble that location you'll find printf()
's code (how readable that disassembly might be is another story).
In other words, _imp__printf
is a pointer to printf()
, not printf()
itself.
Edit from after more information in the comments below:
A litle poking around indicates that jmp *0x405130
is the GAS/AT&T assembly syntax for jmp [0x405130]
instruction when using the Intel assembly syntax.
What makes this curious is that you say that the gdb command x/xw 0x405130
shows that that address contains 0x00005274
(which seems to match up with what you got when you disassembled 0x405130). However, that would mean that jmp [0x405130]
would try to jump to address 0x00005274
, which doesn't seem right (and gdb said as much when you tried to disassemble that address.
It's possible that the _imp_printf
entry is using some sort of lazy binding technique where the first time execution jumps through 0x405130, it hits the 0x00005274 address which causes the OS to field a trap and fixup the dynamic link. After the fixup, the OS will restart execution with the correct link address in 0x405130. But this is sheer guesswork on my part. I have no idea if the system you're using does anything like this (indeed, I don't even know what system you're running on), but it's technically possible. If something like this is going on, you won't see the correct address in 0x405130 until after the first call to printf()
has been made.
I think you'll need to single step through a call to printf()
at the assembly level to see what's really going on.
Updated information with a GDB session:
Here's the problem you're running into - you're looking at the process before the system has loaded DLLs and fixed up the linkages to the DLLs. Here's a debugging session of a simple "hello world" program compiled with MinGW debugged with GDB:
C:\temp>\mingw\bin\gdb test.exe
GNU gdb (GDB) 7.1
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "mingw32".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from C:\temp/test.exe...done.
(gdb) disas main
Dump of assembler code for function main:
0x004012f0 <+0>: push %ebp
0x004012f1 <+1>: mov %esp,%ebp
0x004012f3 <+3>: sub $0x8,%esp
0x004012f6 <+6>: and $0xfffffff0,%esp
0x004012f9 <+9>: mov $0x0,%eax
0x004012fe <+14>: add $0xf,%eax
0x00401301 <+17>: add $0xf,%eax
0x00401304 <+20>: shr $0x4,%eax
0x00401307 <+23>: shl $0x4,%eax
0x0040130a <+26>: mov %eax,-0x4(%ebp)
0x0040130d <+29>: mov -0x4(%ebp),%eax
0x00401310 <+32>: call 0x401850 <_alloca>
0x00401315 <+37>: call 0x4013d0 <__main>
0x0040131a <+42>: movl $0x403000,(%esp)
0x00401321 <+49>: call 0x4018b0 <printf>
0x00401326 <+54>: mov $0x0,%eax
0x0040132b <+59>: leave
0x0040132c <+60>: ret
End of assembler dump.
Note that disassembling printf()
leads to a similar indirect jump:
(gdb) disas printf
Dump of assembler code for function printf:
0x004018b0 <+0>: jmp *0x4050f8 ; <<-- indirect jump
0x004018b6 <+6>: nop
0x004018b7 <+7>: nop
End of assembler dump.
And that the _imp__printf
symbiol makes no sense as code...
(gdb) disas 0x4050f8
Dump of assembler code for function _imp__printf:
0x004050f8 <+0>: clc ; <<-- how can this be printf()?
0x004050f9 <+1>: push %ecx
0x004050fa <+2>: add %al,(%eax)
End of assembler dump.
or as a pointer...
(gdb) x/xw 0x4050f8
0x4050f8 <_imp__printf>: 0x000051f8 ; <<-- 0x000051f8 is an invalid pointer
Now, let's set a breakpoint at main()
, and run to it:
(gdb) break main
Breakpoint 1 at 0x40131a: file c:/temp/test.c, line 5.
(gdb) run
Starting program: C:\temp/test.exe
[New Thread 11204.0x2bc8]
Error while mapping shared library sections:
C:\WINDOWS\SysWOW64\ntdll32.dll: No such file or directory.
Breakpoint 1, main () at c:/temp/test.c:5
5 printf( "hello world\n");
printf()
looks the same:
(gdb) disas printf
Dump of assembler code for function printf:
0x004018b0 <+0>: jmp *0x4050f8
0x004018b6 <+6>: nop
0x004018b7 <+7>: nop
End of assembler dump.
but _imp__printf
looks different - the dynamic link has now been fixed up:
(gdb) x/xw 0x4050f8
0x4050f8 <_imp__printf>: 0x77bd27c2
And if we disassemble what _imp__printf
is now pointing to, it might not be very readable, but clearly it's code now. This is printf()
as implemented in MSVCRT.DLL:
(gdb) disas _imp__printf
Dump of assembler code for function printf:
0x77bd27c2 <+0>: push $0x10
0x77bd27c4 <+2>: push $0x77ba4770
0x77bd27c9 <+7>: call 0x77bc84c4 <strerror+554>
0x77bd27ce <+12>: mov $0x77bf1cc8,%esi
0x77bd27d3 <+17>: push %esi
0x77bd27d4 <+18>: push $0x1
0x77bd27d6 <+20>: call 0x77bcca49 <msvcrt!_lock+4816>
0x77bd27db <+25>: pop %ecx
0x77bd27dc <+26>: pop %ecx
0x77bd27dd <+27>: andl $0x0,-0x4(%ebp)
0x77bd27e1 <+31>: push %esi
0x77bd27e2 <+32>: call 0x77bd400d <wscanf+3544>
0x77bd27e7 <+37>: mov %eax,-0x1c(%ebp)
0x77bd27ea <+40>: lea 0xc(%ebp),%eax
0x77bd27ed <+43>: push %eax
0x77bd27ee <+44>: pushl 0x8(%ebp)
0x77bd27f1 <+47>: push %esi
0x77bd27f2 <+48>: call 0x77bd3330 <wscanf+251>
0x77bd27f7 <+53>: mov %eax,-0x20(%ebp)
0x77bd27fa <+56>: push %esi
0x77bd27fb <+57>: pushl -0x1c(%ebp)
0x77bd27fe <+60>: call 0x77bd4099 <wscanf+3684>
0x77bd2803 <+65>: add $0x18,%esp
0x77bd2806 <+68>: orl $0xffffffff,-0x4(%ebp)
0x77bd280a <+72>: call 0x77bd281d <printf+91>
0x77bd280f <+77>: mov -0x20(%ebp),%eax
0x77bd2812 <+80>: call 0x77bc84ff <strerror+613>
0x77bd2817 <+85>: ret
0x77bd2818 <+86>: mov $0x77bf1cc8,%esi
0x77bd281d <+91>: push %esi
0x77bd281e <+92>: push $0x1
0x77bd2820 <+94>: call 0x77bccab0 <msvcrt!_lock+4919>
0x77bd2825 <+99>: pop %ecx
0x77bd2826 <+100>: pop %ecx
0x77bd2827 <+101>: ret
0x77bd2828 <+102>: int3
0x77bd2829 <+103>: int3
0x77bd282a <+104>: int3
0x77bd282b <+105>: int3
0x77bd282c <+106>: int3
End of assembler dump.
It's probably harder to read than you might hope because I'm not sure if proper symbols are available for it (or whether GDB can properly read those symbols).
However, as I mentioned in another answer, you can get typically get the source for C runtime routines with your compiler, whether open source or not. MinGW doesn't come with the source for MSVDRT.DLL since that's a Windows thing, but you can get the source for it (or something pretty close to it) in a Visual Studio distribution - I think that even the free VC++ Express comes with runtime source (but I might be wrong about that).
Upvotes: 1
Reputation: 18446
The *
is AT&T assembler syntax for indirect memory reference. I.e.
jmp *<addr>
means "jump to the address stored in <addr>
".
It is equivalent to the following Intel syntax:
jmp [addr]
Branch addressing using registers or memory operands must be prefixed by a '*'
Upvotes: 3
Reputation: 340208
Virtually all C compilers provide the source the their runtime libraries - not just open source compilers. Unfortunately, they're often written in rather difficult to follow form and they don't generally come with design rationale documents.
So, a very nice resource for dealing with that problem is P.J. Plauger's "The Standard C Library", which provides not only the source for a library implementation but also has details on how it's designed and the special situations that such a library might have to consider.
At the prices that some of the 'used' versions of the book are being offered, it's a steal and should be on any serious C programmer's bookshelf.
Plauger has similar books targeting the C++ library that I think have similar value:
Upvotes: 2
Reputation: 22922
I'd say disassembling works just fine here, and that printf is implemented 'under the hood' here using vfprintf, which is pretty much what you'd expect. Note that assembler is typically much more verbose than the C, and time consuming to make sense of where you don't have the annotated source. Compiler output is not a great way of teaching yourself assembler either.
Upvotes: 1
Reputation: 61713
printf()
is most likely located in a dynamic shared library.
The dynamic linker fills a table with the addresses of the imported functions; that's why you have to make that indirect call.
I don't really recall how this works; it's probable that optimizations complicate the process. But you get the idea.
Upvotes: 0
Reputation: 229108
Here's one particular implementation, http://ftp.fr.openbsd.org/pub/OpenBSD/src/lib/libc/stdio/printf.c and http://ftp.fr.openbsd.org/pub/OpenBSD/src/lib/libc/stdio/vfprintf.c
Upvotes: 8