Reputation: 127
I am unable to understand why the 3rd and 4th printf
's are giving 54 and -61.
According to me, the program should have given 0 as output because character pointer is expected to display output value up to (sizeof(char) * 8)
bits and 54 in binary is 00000000 00110110
.
#include<stdio.h>
void main()
{
int i=54;
float a=3.14;
char *ii,*aa;
ii=(char *)&i;
aa=(char *)&a;
printf("%u\n",ii);
printf("%u\n",aa);
printf("%d\n",*ii);
printf("%d\n",*aa);
}
Edit: The fourth printf
(if I use %f
there, I typed %d
by mistake) is giving 0.00000
. Why?
Upvotes: 0
Views: 138
Reputation: 44256
Your third output displays 54, because on your machine,
int i=54;
is stored in memory like this:
36 00 00 00
your pointer points here:
36 00 00 00
^^
And thus when you print out that 0x36 as a char
(a one byte long integral type), you see 54.
This storage format is called "little endian", and is used on x86 and amd64 processors, which are quite common.
Note that the language does not guarantee that integers are stored this way; you may very well get a different result with a different machine or compiler. Don't depend on it.
The float
works similarly, but is much more complicated to show. Again, it's quite machine dependent. For an amd64, if you encode 3.14
in an IEEE single (this is platform dependent), and then store the four bytes backwards (at least, I believe amd64 stores them "little endian", though I'm not sure why, since it's a float.¹), the byte value in the first slot, when looked at as a signed 8-bit two's complement integer (this is also platform dependent), should work out to the value you're seeing.
Last, you say:
i didn't know about little edian. but is that not with float. it is giving 0.000000000 if i use %f in place of %d in fourth (by mistake i typed %d here)
I'm going to assume you mean:
printf("%f\n",*aa);
And that aa
is still a char *
. This isn't well-formed: for %f
, you need to pass a double
or a float
. However, let's plow on, and attempt to explain this (undefined!) behavior.
Since it's a char *
, when you dereference it, on your machine, it'll likely read some one-byte value. 3.14
, as a little endian float, is:
c3 f5 48 40
^^
0xc3
, as a two's complement signed one byte integer, is -61, which explains your question. Thus, for your program *aa
is -61. When you pass this to printf
, it'll be promoted to an int
, because printf
is a "varargs" (variable number of arguments) function. You can see this when compiling in some compilers:
prog1.c:14:7: warning: format ‘%f’ expects argument of type ‘double’, but argument 2 has type ‘int’ [-Wformat]
Thus, an "int" will get passed to printf
in whatever manner your platform uses. Let's investigate that. For explicitness, I'm compiling the following:
#include<stdio.h>
int main()
{
int i=54;
float a=3.14;
char *ii,*aa;
ii=(char *)&i;
aa=(char *)&a;
printf("%u\n",ii);
printf("%u\n",aa);
printf("%d\n",*ii);
printf("%f\n",*aa);
return 0;
}
I do:
% gcc -g -o prog1 prog1.c
prog1.c: In function ‘main’:
prog1.c:11:2: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 2 has type ‘char *’ [-Wformat]
prog1.c:12:2: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 2 has type ‘char *’ [-Wformat]
prog1.c:14:2: warning: format ‘%f’ expects argument of type ‘double’, but argument 2 has type ‘int’ [-Wformat]
(In case it isn't clear: gcc
is throwing really good warnings here: it's pointing out undefined behavior — bugs — in your program. You should always fix these. We're going to ignore them to investigate, but note that the compiler can really do whatever it wants at this point, so everything below is anything but guaranteed.)
Then, let's start this is a debugger, and stop on that last printf. For me, that's line 14. Thus:
% gdb prog1
GNU gdb (Gentoo 7.6.2 p1) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /home/me/code/random/prog1...done.
(gdb) break prog1.c:14
Breakpoint 1 at 0x4005db: file prog1.c, line 14.
Let's run it up to that breakpoint.
(gdb) r
Starting program: /home/me/code/random/prog1
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
4294959628
4294959624
54
Breakpoint 1, main () at prog1.c:14
14 printf("%f\n",*aa);
Now we're stopped on the "printf
", but what does that mean? Let's look at some assembler!
(gdb) disassemble
Dump of assembler code for function main:
0x000000000040056c <+0>: push %rbp
0x000000000040056d <+1>: mov %rsp,%rbp
0x0000000000400570 <+4>: sub $0x20,%rsp
0x0000000000400574 <+8>: movl $0x36,-0x14(%rbp)
0x000000000040057b <+15>: mov 0x12f(%rip),%eax # 0x4006b0
0x0000000000400581 <+21>: mov %eax,-0x18(%rbp)
0x0000000000400584 <+24>: lea -0x14(%rbp),%rax
0x0000000000400588 <+28>: mov %rax,-0x8(%rbp)
0x000000000040058c <+32>: lea -0x18(%rbp),%rax
0x0000000000400590 <+36>: mov %rax,-0x10(%rbp)
0x0000000000400594 <+40>: mov -0x8(%rbp),%rax
0x0000000000400598 <+44>: mov %rax,%rsi
0x000000000040059b <+47>: mov $0x4006a4,%edi
0x00000000004005a0 <+52>: mov $0x0,%eax
0x00000000004005a5 <+57>: callq 0x400450 <printf@plt>
0x00000000004005aa <+62>: mov -0x10(%rbp),%rax
0x00000000004005ae <+66>: mov %rax,%rsi
0x00000000004005b1 <+69>: mov $0x4006a4,%edi
0x00000000004005b6 <+74>: mov $0x0,%eax
0x00000000004005bb <+79>: callq 0x400450 <printf@plt>
0x00000000004005c0 <+84>: mov -0x8(%rbp),%rax
0x00000000004005c4 <+88>: movzbl (%rax),%eax
0x00000000004005c7 <+91>: movsbl %al,%eax
0x00000000004005ca <+94>: mov %eax,%esi
0x00000000004005cc <+96>: mov $0x4006a8,%edi
0x00000000004005d1 <+101>: mov $0x0,%eax
0x00000000004005d6 <+106>: callq 0x400450 <printf@plt>
=> 0x00000000004005db <+111>: mov -0x10(%rbp),%rax
0x00000000004005df <+115>: movzbl (%rax),%eax
0x00000000004005e2 <+118>: movsbl %al,%eax
0x00000000004005e5 <+121>: mov %eax,%esi
0x00000000004005e7 <+123>: mov $0x4006ac,%edi
0x00000000004005ec <+128>: mov $0x0,%eax
0x00000000004005f1 <+133>: callq 0x400450 <printf@plt>
0x00000000004005f6 <+138>: mov $0x0,%eax
0x00000000004005fb <+143>: leaveq
0x00000000004005fc <+144>: retq
That's main
, and the arrow (=>
) is where we are. The call
instruction at 0x00000000004005f1
is the call to your fourth printf
, and as you can see, there's some setup required to call it: all those mov
instructions. Since they set up the call, and what we're interested in is what get's passed to printf
, we'll need to let them run, so we need to step the program up to just right at that call
instruction. We can do this with another breakpoint:
(gdb) break *0x00000000004005f1
Breakpoint 2 at 0x4005f1: file prog1.c, line 14.
(gdb) continue
Continuing.
Breakpoint 2, 0x00000000004005f1 in main () at prog1.c:14
14 printf("%f\n",*aa);
Now we're at that call
statement. Now, because I'm on an amd64 chip (an Intel Core i7. These are also sometimes referred to x86-64.) and I'm not running Windows, for me, we call a function by putting the arguments, from left to right, into certain registers. From the right, the first argument is *aa
, which remember, we've established to be -61. We can dump our registers with:
(gdb) info all-registers
rax 0x0 0
rbx 0x0 0
rcx 0x2 2
rdx 0x7ffff7dd7820 140737351874592
rsi 0xffffffc3 4294967235
rdi 0x4006ac 4196012
rbp 0x7fffffffe220 0x7fffffffe220
rsp 0x7fffffffe1f8 0x7fffffffe1f8
r8 0x2 2
r9 0x7ffff7dd4640 140737351861824
r10 0x7fffffffe0d8 140737488347352
r11 0x246 582
r12 0x400480 4195456
r13 0x7fffffffe300 140737488347904
[ snip … ]
ymm0 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0,
0xff, 0x0, 0x0, 0x0, 0xff, 0x0 <repeats 19 times>}, v16_int16 = {0x0, 0x0, 0xff, 0x0, 0xff, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0x0, 0xff, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xff00000000, 0xff000000ff, 0x0, 0x0}, v2_int128 = {0x000000ff000000ff000000ff00000000,
0x00000000000000000000000000000000}}
Since -61 is an integer, it ends up in an integer register, here, we can see that it's in rsi
. (It's been sign extended, which is why it's 0xffffffc3
: -61 in 4 bytes, instead of one.) However, %f
, being a float, will most likely read a floating point register, such as ymm0
on my machine. It happens to be zero. That doesn't need to be true, since this is undefined behavior, but, it is, and thus, we'll get zero.
¹This isn't one of those things you care about often, except for morbid curiosity.
²The only part I can't explain is why our integer ended up in rsi
. I feel like it should have been in rdi
. Like I said, morbid curiosity. (Edit: Ugh, curse my curiosity. It ends up in rdi
because rdi
is used for the second argument, and it's the second argument. Wikipedia has it labelled as "right to left", but that only applies to stuff on the stack: registers are assigned left to right.)
Upvotes: 5