Engineer999
Engineer999

Reputation: 3955

What will the compiler do

I've been programming for a few years but embarrassingly, there are one or two things i'm still not fully clear about.

In the following basic code below just used for an example, when the compiler encounters myFunc(), where will str1 and str2 get stored?

They are pointers to string literals so I assume the string literal will get stored in read only memory, but what is the difference in this case between one pointer being static local and the other one not? Also, I thought local variables will get stored on the stack and they are not allocated until the function is called? This is confusing.

In the case of the integers, var1, it's non-static, but var2 is static. Will the compiler place this var2 in the data segment at compilation time. I've read on another post When do function-level static variables get allocated/initialized? , that local static variables will get created and initialsed the first time they are used and not during compilation. So in that case, what if the function is never called?

Thanks in advance for experienced knowledge.

EDITED: To call myFunc() from main(). It was a typo as myFunc() was never even called

int myFunc()
{
    static char* str1 = "Hello";
    char* str2 = "World";

    int var1 = 1;
    static int var2 = 8;

}

int main()
{

    return myFunc();
}

Upvotes: 1

Views: 461

Answers (6)

Keith Thompson
Keith Thompson

Reputation: 263337

What the compiler does must be based (assuming a correctly working compiler) on the semantics of the code, so that's what I'll discuss.

First, a fairly minor point. By declaring a function with (), you specify that it takes an fixed but unspecified number and type(s) of arguments. That's an obsolescent form of declaration/definition, and there's rarely if ever a good reason to use it. (Empty parentheses have a different meaning in C++, but you're asking about C.) To specify that a function has no parameters, use (void) rather than () (especially for main, since it's not 100% clear that int main() must be accepted by a conforming compiler).

With that change:

int myFunc(void)
{
    static char* str1 = "Hello";
    char* str2 = "World";
    int var1 = 1;
    static int var2 = 8;
}

int main(void)
{
    return myFunc();
}

This program does nothing; it produces no output, and has no side effects. A compiler is permitted to compile it down to nearly nothing. But let's ignore that and assume that nothing is discarded.

There are two important concepts to consider: scope and lifetime (also known as storage duration). The scope of an identifier is the region of program text in which it is visible. It's purely a compile-time concept. The lifetime of an object is the duration during execution in which that object exists. It's purely a run-time concept. The two are often confused, particularly when you use the words "local" and "global".

An object with automatic storage duration is created on entry to the block in which it's defined, and (logically) destroyed on exit from that block. In your program, the relevant block is enclosed by the { and } in the definition of myFunc().

An object with static storage duration exists during the entire run time of the program.

static char* str1 = "Hello";

"Hello" is a string literal. It specifies a static array of type char[6]; that array (at least logically) exists during the entire execution of the program. You are not allowed to modify the contents of that array -- but for historical reasons, it's not const, and a compiler isn't required to warn you if you try to modify it. String literals are commonly stored in read-only memory (probably not physical ROM, but virtual memory that's marked as read-only).

The pointer object str1 also has static storage duration, though its name is visible only within the enclosing block ("block scope"). It's initialized to point to the initial character of "Hello". This initialization logically occurs before entry to main. Since a string literal is effectively read-only, it would have been better to use const to avoid the risk of accidentally trying to modify it:

static const char *str1 = "hello";

Next:

char* str2 = "World";

The name of the pointer object str2 has the same kind of block scope as str1, but the pointer object itself has automatic storage duration. it is created on entry to the enclosing block and destroyed on exit. It's initialized to point to the initial character of "World"; that initialization takes place when execution reaches the declaration. Again, it would be better to add a const to the declaration.

int var1 = 1;
static int var2 = 8;

var has block scope and automatic storage duration. It's initialized to 1 when its declaration is reached at run time. var2 has block scope and static storage duration. The object exists for the entire execution of the program, and it's initialized to 8 before entry to main().

Now we run into a bit of a problem. You've defined myFunc() to return an int result, but you don't actually return anything. As it happens, this isn't invalid by itself, but if the result is used by a caller (as it is by your main() function), the behavior is undefined. The fix is simple: add a return 0; before the closing }.

Assuming you've added that, main calls myFunc. During execution of myFunc, str2 and var1 are allocated somehow and are initialized as I've described. (Nothing happens to str1 or var2 because they're static.) On return from the function, the storage allocated for str2 and var1 is released, effectively destroying the objects.


But the question you asked was: What will the compiler do? And the answer to that is: It will generate whatever code is necessary to implement the semantics I've just described. That's really all the C standard requires.

In practice, most compilers generate code that allocates variables with automatic storage duration on the "stack". The "stack" is usually a contiguous region of memory, starting from some fixed base address, that grows in one direction as items are added to it and shrinks in the other direction as items are removed. It's typically managed via a CPU register, the "stack pointer". (Some CPUs also have a "frame pointer".) But in fact all that the C standard requires is that such objects are allocated and deallocated in a first-in last-out manner -- and the actual allocation and deallocation needn't take place when you'd expect, as long as the resulting behavior is the same. For example, if you define a local object inside a loop, it might be allocated and deallocated on each iteration, or its allocation might be folded into the surrounding scope. The C standard doesn't care (and, in most cases, neither should you). There are even some compilers that don't use a contiguous stack at all; rather the storage for each function call is allocated from a heap. A contiguous stack is the best solution 90+% of the time, but it's not required.

Objects with static storage duration are typically allocated on program startup, before main is called. Most systems store the initial contents of any initialized static objects in the executable file, so it can be loaded into memory. (That's likely to include string literals.) For static objects whose initial value is zero, the executable might just contain information about how much zeroed memory to allocate.

As for the generated instructions that operate on this data, that is entirely dependent on the CPU being targeted, and probably on the system ABI.

Upvotes: 2

mgarey
mgarey

Reputation: 752

EDIT:

The other answer and comments are correct - as is, your variables will be optimized out because they aren't even used. But let's have a little fun and actually use them to see what happens.

I compiled the op's program as-is with gcc -S trial.c, and although myFunc was never called, nothing else about this answer changes.

I've slightly modified your program to actually use those variables so we can learn a little more about what the compiler and linker will do. Here it is:

#include <stdio.h>

int myFunc()
{
    static const char* str1 = "Hello";
    const char* str2 = "World";

    int var1 = 1;
    static int var2 = 8;
    printf("%s %s %d %d\n", str1, str2, var1, var2);
    return 0;
}

int main()
{
    return myFunc();
}

I compiled with gcc -S trial.c and got the following assembly file:

    .file   "trial.c"
    .section .rdata,"dr"
.LC0:
    .ascii "World\0"
.LC1:
    .ascii "%s %s %d %d\12\0"
    .text
    .globl  myFunc
    .def    myFunc; .scl    2;  .type   32; .endef
    .seh_proc   myFunc
myFunc:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $64, %rsp
    .seh_stackalloc 64
    .seh_endprologue
    leaq    .LC0(%rip), %rax
    movq    %rax, -8(%rbp)
    movl    $1, -12(%rbp)
    movl    var2.3086(%rip), %edx
    movq    str1.3083(%rip), %rax
    movl    -12(%rbp), %r8d
    movq    -8(%rbp), %rcx
    movl    %edx, 32(%rsp)
    movl    %r8d, %r9d
    movq    %rcx, %r8
    movq    %rax, %rdx
    leaq    .LC1(%rip), %rcx
    call    printf
    movl    $0, %eax
    addq    $64, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .def    __main; .scl    2;  .type   32; .endef
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $32, %rsp
    .seh_stackalloc 32
    .seh_endprologue
    call    __main
    call    myFunc
    addq    $32, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .data
    .align 4
var2.3086:
    .long   8
    .section .rdata,"dr"
.LC2:
    .ascii "Hello\0"
    .data
    .align 8
str1.3083:
    .quad   .LC2
    .ident  "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
    .def    printf; .scl    2;  .type   32; .endef

var1 isn't even found in the assembly file. It's actually just a constant that gets loaded onto the stack.

At the top of the assembly file, we see "World" (str2) in the .rdata section. Lower down in the assembly file, the string "Hello" is in the .rdata section, but the label for str1 (which contains the label, or address, for "Hello") is in the .data section. var2 is also in the .data section.

Here's a stackoverflow question that delves a little deeper into why this happens.

Another stackoverflow question points out that the .rdata section is the read-only section of .data and explains the different sections.

Hope this helps.


EDIT:

I decided to try this with the -O3 compiler flag (high optimizations). Here's the assembly file that I got:

    .file   "trial.c"
    .section .rdata,"dr"
.LC0:
    .ascii "World\0"
.LC1:
    .ascii "Hello\0"
.LC2:
    .ascii "%s %s %d %d\12\0"
    .section    .text.unlikely,"x"
.LCOLDB3:
    .text
.LHOTB3:
    .p2align 4,,15
    .globl  myFunc
    .def    myFunc; .scl    2;  .type   32; .endef
    .seh_proc   myFunc
myFunc:
    subq    $56, %rsp
    .seh_stackalloc 56
    .seh_endprologue
    leaq    .LC0(%rip), %r8
    leaq    .LC1(%rip), %rdx
    leaq    .LC2(%rip), %rcx
    movl    $8, 32(%rsp)
    movl    $1, %r9d
    call    printf
    nop
    addq    $56, %rsp
    ret
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE3:
    .text
.LHOTE3:
    .def    __main; .scl    2;  .type   32; .endef
    .section    .text.unlikely,"x"
.LCOLDB4:
    .section    .text.startup,"x"
.LHOTB4:
    .p2align 4,,15
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    subq    $40, %rsp
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
    xorl    %eax, %eax
    addq    $40, %rsp
    ret
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE4:
    .section    .text.startup,"x"
.LHOTE4:
    .ident  "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
    .def    printf; .scl    2;  .type   32; .endef

var1 is now just a constant 1 that is placed in a register (r9d). var2 is also just a constant, but it's placed on the stack. Also, the strings "Hello" and "World" are accessed in a more direct (efficient) way.

So, I decided that I wanted to try something slightly different:

#include <stdio.h>

void myFunc()
{
    static const char* str1 = "Hello";
    const char* str2 = "World";

    int var1 = 1;
    static int var2 = 8;
    printf("%s %s %d %d\n", str1, str2, var1, var2);

    var1++;
    var2++;
    printf("%d %d", var1, var2);
}

int main()
{
    myFunc();
    myFunc();
    return 0;
}

And the associated assembly using gcc -O3 -S trial.c

    .file   "trial.c"
    .section .rdata,"dr"
.LC0:
    .ascii "World\0"
.LC1:
    .ascii "Hello\0"
.LC2:
    .ascii "%s %s %d %d\12\0"
.LC3:
    .ascii "%d %d\0"
    .section    .text.unlikely,"x"
.LCOLDB4:
    .text
.LHOTB4:
    .p2align 4,,15
    .globl  myFunc
    .def    myFunc; .scl    2;  .type   32; .endef
    .seh_proc   myFunc
myFunc:
    subq    $56, %rsp
    .seh_stackalloc 56
    .seh_endprologue
    movl    var2.3086(%rip), %eax
    leaq    .LC0(%rip), %r8
    leaq    .LC1(%rip), %rdx
    leaq    .LC2(%rip), %rcx
    movl    $1, %r9d
    movl    %eax, 32(%rsp)
    call    printf
    movl    var2.3086(%rip), %eax
    leaq    .LC3(%rip), %rcx
    movl    $2, %edx
    leal    1(%rax), %r8d
    movl    %r8d, var2.3086(%rip)
    addq    $56, %rsp
    jmp printf
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE4:
    .text
.LHOTE4:
    .def    __main; .scl    2;  .type   32; .endef
    .section    .text.unlikely,"x"
.LCOLDB5:
    .section    .text.startup,"x"
.LHOTB5:
    .p2align 4,,15
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    subq    $40, %rsp
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
    call    myFunc
    call    myFunc
    xorl    %eax, %eax
    addq    $40, %rsp
    ret
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE5:
    .section    .text.startup,"x"
.LHOTE5:
    .data
    .align 4
var2.3086:
    .long   8
    .ident  "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
    .def    printf; .scl    2;  .type   32; .endef

This is looking a little more like the original. var1 is still optimized to just constants, but var2 is now in the .data section again. "Hello" and "World" are still in the .rdata section because they are constant.

One of the comments points out that this would be different on different platforms with different compilers. I encourage you to try it out.

Upvotes: 11

GreatAndPowerfulOz
GreatAndPowerfulOz

Reputation: 1775

static variables, even those within a function scope, will get stored at global scope. The static variables within a function or scope will get initialized only the first time that function or scope is entered. Non-static variables will get allocated or stored on the stack in most compilers when function scope is entered and initialized when scope is entered. Some compilers store local variables elsewhere.

Upvotes: 0

Jean-Baptiste Yun&#232;s
Jean-Baptiste Yun&#232;s

Reputation: 36401

Your code cannot be compiled without at least a warning as the function never returns anything which contradicts the return type specification.

Anyway on my machine it generate code. If you don't use any optimization code is emitted for the function to allocate the local str2. str1 and var2 are allocated in the data section of the code to point to the respective values. If you use optimization obviously a stupid code is emitted and unsued local variable disappeared as unused globals.

To observe this you can at least examine the object code with nm:

$ gcc -o p p.c
$ nm p
0000000100000f90 T _main
0000000100000f70 T _myFunc
0000000100001000 d _myFunc.str1
0000000100001008 d _myFunc.var2
$ gcc -O3 -o p2 p.c
$ nm p2
0000000100000fb0 T _main
0000000100000fa0 T _myFunc

If you want more details, then generate assembler code with -S and observe what happens.

Upvotes: 1

caps
caps

Reputation: 1243

static const char* str1 = "Hello";

str1 is a static local pointer to a string literal which will be stored in read-only memory.

const char* str2 = "World";

str2 is a local, "stack-allocated" pointer to a string literal which will be stored in read-only memory.

The values of str1 and str2 are the respective addresses of the string literals they point to.

int var1 = 1;
static int var2 = 8;

If these lines of code are never reached, var2 will never be initialized. I don't know if the compiler sets aside a block of memory for it somewhere else at compiletime or not.

Upvotes: 3

Lightness Races in Orbit
Lightness Races in Orbit

Reputation: 385194

The compiler will produce a program that takes no input, does nothing, then emits no output.

All of those declarations are completely irrelevant as they do not contribute anything to the [non-existent] result of the program. You might say they "get optimised out", though the reality is that they literally have no analogue in your resulting compiled executable.

Upvotes: 0

Related Questions