Dan Oberlam
Dan Oberlam

Reputation: 2496

Different segfaults for command line parsing by different compilers

I've been learning C by following through Zed Shaw's tutorial and ran into some issues on this exercise. Code is as follows

#include <stdio.h>

int main(int argc, char *argv[])
{
    int i = 0;
    while(i < argc) {
        printf("arg %d: %s\n", i, argv[i]);
        i++;
    }
    // section removed for brevity
    return 0;
}

I'm using Windows and don't like the hassle of virtualboxing so I've been running in Cygwin. I have two compilers, one is the gcc that came with Cygwin, the other is the version (of gcc) that comes with mingw so I can use DrMemory.

I make the file (named ex11.c) like this

# Makefile
ex11: ex11.c
    gcc -o ex11.exe ex11.c
    i686-pc-mingw32-gcc.exe -static-libgcc -static-libstdc++ -ggdb -o ex11b.exe ex11.c

# Command Line
>>> make ex11
   ...
   etc

The command for the second one I got here.

$ gcc --version
gcc.exe (rubenvb-4.6.3) 4.6.3)
$ i686-pc-mingw32-gcc --version
i686-pc-mingw32-gcc (GCC) 4.7.3

Then when I run them (./ex11 and ./ex11b) I get issues. Running the normal version (without the b) without command line arguments gives me a segfault. Running with arguments gives me this output:

$ ./ex11 a
arg 0: a
arg 1: a

Running the mingw version (with b) I have no problems without command line arguments:

$ ./ex11b
arg 0: (null)

but then running the same with a command line argument ($ ./ex11b a) segfaults me.

Assembler output of the first

    .file   "ex11.c"
    .def    __main; .scl    2;  .type   32; .endef
    .section .rdata,"dr"
.LC0:
    .ascii "arg %d: %s\12\0"
    .text
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    subq    $48, %rsp
    .seh_stackalloc 48
    .seh_setframe   %rbp, 48
    .seh_endprologue
    movl    %ecx, 16(%rbp)
    movq    %rdx, 24(%rbp)
    call    __main
    movl    $0, -4(%rbp)
    jmp .L2
.L3:
    movq    24(%rbp), %rax
    addq    $72, %rax
    movq    (%rax), %rcx
    leaq    .LC0(%rip), %rax
    movl    -4(%rbp), %edx
    movq    %rcx, %r8
    movq    %rax, %rcx
    call    printf
    addl    $1, -4(%rbp)
.L2:
    movl    -4(%rbp), %eax
    cmpl    16(%rbp), %eax
    jl  .L3
    movl    $0, %eax
    addq    $48, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .def    printf; .scl    2;  .type   32; .endef

Assembler output of the second

    .file   "ex11.c"
    .def    ___main;    .scl    2;  .type   32; .endef
    .section .rdata,"dr"
LC0:
    .ascii "arg %d: %s\12\0"
    .text
    .globl  _main
    .def    _main;  .scl    2;  .type   32; .endef
_main:
LFB6:
    .cfi_startproc
    pushl   %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $32, %esp
    call    ___main
    movl    $0, 28(%esp)
    jmp L2
L3:
    movl    12(%ebp), %eax
    addl    $36, %eax
    movl    (%eax), %eax
    movl    %eax, 8(%esp)
    movl    28(%esp), %eax
    movl    %eax, 4(%esp)
    movl    $LC0, (%esp)
    call    _printf
    addl    $1, 28(%esp)
L2:
    movl    28(%esp), %eax
    cmpl    8(%ebp), %eax
    jl  L3
    movl    $0, %eax
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
LFE6:
    .def    _printf;    .scl    2;  .type   32; .endef

I know what causes the segfault, I think. I have i initialized to 0 so I'm sometimes trying to get a null value, which one doesn't like. What I'm wondering is what about these compilers is different that they'll break like this.

I'm also curious how I could rewrite this so I can start at i=0

Upvotes: 1

Views: 105

Answers (2)

Nisse Engstr&#246;m
Nisse Engstr&#246;m

Reputation: 4752

Your compilers or environment appear to be broken somehow. All the elements of the argv[] array must point to strings, and only argv[argc] must be NULL. If the program name is not available, then argv[0] must point to an empty string ("").

What you can do is test for NULL in your loop, but you really shouldn't have to:

while(i < argc) {
    if (argv[i]) {
        printf("arg %d: %s\n", i, argv[i]);
    }
    i++;
}

Upvotes: 4

user539810
user539810

Reputation:

Your compilers appear to be generating incorrect addresses, specifically the calculation of the array offset for argv as shown in the 32-bit (i686) assembly code:

L3:
    movl    12(%ebp), %eax
    addl    $36, %eax
    movl    (%eax), %eax
    movl    %eax, 8(%esp) 
    movl    28(%esp), %eax 
    movl    %eax, 4(%esp)
...
call _printf

Detangling that mess, you end up with the following:

printf(..., i, argv[9]); //argv[9]

The addl $1, 28(%esp) instruction is the i++ in your C code, so you can guess what the equivalents of 12(%ebp) and 28(%esp) are in your C code.

Anyway, the big picture is this: nothing is done with argv[i] because it is always argv[9] that is passed to printf. The addl $36, %eax should be addl %edx, %eax, assuming the %edx register was used to store the value of i before performing the add operation, which it isn't in the assembly code you provided.

In other words, your code isn't being compiled correctly with either compiler. Unfortunately I have no idea what is causing the problem. Have you tried using i686-pc-mingw32-gcc.exe outside of the Cygwin shell? Maybe your Cygwin installation is somehow messed up?

Upvotes: 1

Related Questions