Why doesn't using an unitialized pointer cause a segmentation fault?

When run this program:

#include<stdio.h>

int main()
{
   char *a[10];
   scanf("%s",a[0]);
   printf("%s",a[0]);
   return 0;
}

It seems to work perfectly without showing a segmentation fault.

Since each element of array a is a pointer (i.e. a[0]) which has not been initialized, why is the program not showing a segmentation fault?

Upvotes: 2

Views: 252

Answers (1)

dbush
dbush

Reputation: 224082

When you dereference an uninitialized pointer, you invoke undefined behavior.

While this often does cause a crash, it doesn't necessarily have to. That's why it's called undefined behavior. The program may crash, it may behave in an unexpected manner, or (as you have seen) it may appear to work correctly. This behavior can change with a seemingly unrelated code change, such an adding one or more local variables.

It also means that you can't depend on any particular behavior. If you use a different compiler, or build on a different machine, you can get different results.

Let's illustrate undefined behavior a bit more. When I ran your code, I got a segmentation fault, while for you it appears to run normally.

Below is the output I get when run under Valgrind:

==1047== Memcheck, a memory error detector
==1047== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==1047== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==1047== Command: /tmp/x1
==1047==
hello
==1047== Conditional jump or move depends on uninitialised value(s)
==1047==    at 0x3FA445345F: _IO_vfscanf (in /lib64/libc-2.5.so)
==1047==    by 0x3FA445DCAB: scanf (in /lib64/libc-2.5.so)
==1047==    by 0x4004F2: main (x1.c:6)
==1047==
==1047== Use of uninitialised value of size 8
==1047==    at 0x3FA44534D3: _IO_vfscanf (in /lib64/libc-2.5.so)
==1047==    by 0x3FA445DCAB: scanf (in /lib64/libc-2.5.so)
==1047==    by 0x4004F2: main (x1.c:6)
==1047==
==1047==
==1047== Process terminating with default action of signal 11 (SIGSEGV)
==1047==  Bad permissions for mapped region at address 0x400520
==1047==    at 0x3FA44534D3: _IO_vfscanf (in /lib64/libc-2.5.so)
==1047==    by 0x3FA445DCAB: scanf (in /lib64/libc-2.5.so)
==1047==    by 0x4004F2: main (x1.c:6)
==1047==
==1047== HEAP SUMMARY:
==1047==     in use at exit: 0 bytes in 0 blocks
==1047==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==1047==
==1047== All heap blocks were freed -- no leaks are possible
==1047==
==1047== For counts of detected and suppressed errors, rerun with: -v
==1047== Use --track-origins=yes to see where uninitialised values come from
==1047== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4)

You can see from this output that an uninitialized variable is being used, which subsequently results in a segmentation violation.

Running under gdb, I get this:

GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-45.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/x1...done.
(gdb) start
Temporary breakpoint 1 at 0x4004e0: file /tmp/x1.c, line 8.
Starting program: /tmp/x1
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000

Temporary breakpoint 1, main () at /tmp/x1.c:8
8          scanf("%s",a[0]);
(gdb) p a
$1 = {0x400520 "L\211d$\340L\211l$\350L\215%\223\001 ",
  0x4003bb "H\203\304\b\303\377\065\312\004 ",
  0xca000000000001 <Address 0xca000000000001 out of bounds>,
  0x400557 "H\215\005f\001 ", 0x0, 0x3fa421cbc0 "",
  0x400520 "L\211d$\340L\211l$\350L\215%\223\001 ", 0x0,
  0x7fffffffe860 "\001", 0x0}
(gdb) step
hello

Program received signal SIGSEGV, Segmentation fault.
0x0000003fa44534d3 in _IO_vfscanf_internal () from /lib64/libc.so.6
(gdb)

Make note of what a contains. Now, I'll make a small change:

#include<stdio.h>

int main()
{
   int x[100];
   char *a[10];
   int y[100];
   scanf("%s",a[0]);
   printf("%s",a[0]);
   return 0;
}

I added a local variable before and after a. In a well behaved program, this won't change anything. But when undefined behavior is present, all bets are off.

Running under Valgrind:

==1392== Memcheck, a memory error detector
==1392== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==1392== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==1392== Command: /tmp/x1
==1392==
hello
==1392== Conditional jump or move depends on uninitialised value(s)
==1392==    at 0x3FA445345F: _IO_vfscanf (in /lib64/libc-2.5.so)
==1392==    by 0x3FA445DCAB: scanf (in /lib64/libc-2.5.so)
==1392==    by 0x4004F8: main (x1.c:8)
==1392==
==1392== Conditional jump or move depends on uninitialised value(s)
==1392==    at 0x3FA4443D1C: vfprintf (in /lib64/libc-2.5.so)
==1392==    by 0x3FA444CD09: printf (in /lib64/libc-2.5.so)
==1392==    by 0x40050E: main (x1.c:9)
==1392==
(null)==1392==
==1392== HEAP SUMMARY:
==1392==     in use at exit: 0 bytes in 0 blocks
==1392==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==1392==
==1392== All heap blocks were freed -- no leaks are possible
==1392==
==1392== For counts of detected and suppressed errors, rerun with: -v
==1392== Use --track-origins=yes to see where uninitialised values come from
==1392== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4)

No segfault this time, however "(null)" is printed instead of the input string "hello".

Under gdb:

GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-45.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/x1...done.
(gdb) start
Temporary breakpoint 1 at 0x4004e3: file /tmp/x1.c, line 8.
Starting program: /tmp/x1
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000

Temporary breakpoint 1, main () at /tmp/x1.c:8
8          scanf("%s",a[0]);
(gdb) p a
$1 = {0x0, 0x7fffffffe5d0 "", 0xf63d4e2e <Address 0xf63d4e2e out of bounds>,
  0x7fffffffe760 "0\005@", 0x7fffffffe778 "", 0x3fa4403a90 "", 0x0,
  0x2aaaaaaaf630 "\021\003@", 0x2aaaaaaaf0f0 "", 0x4002ff "__libc_start_main"}
(gdb) step
hello
9          printf("%s",a[0]);
(gdb)
10         return 0;
(gdb)
11      }
(gdb)
0x0000003fa441d9f4 in __libc_start_main () from /lib64/libc.so.6
(gdb)
Single stepping until exit from function __libc_start_main,
which has no line number information.
(null)
Program exited normally.
(gdb) quit

In particular, note the contents of a in each case. For the original program, a[0] contains 0x400520, while in the modified program a[0] contains 0x0.

To summarize, there are no guarantees when it comes to undefined behavior. To avoid it, be sure to compile will all warnings enabled (-Wall -Wextra for GCC) and use memory checkers like Valgrind to catch situations where you're reading from or writing to places you shouldn't.

Upvotes: 2

Related Questions