Reputation: 3955
I've been programming for a few years but embarrassingly, there are one or two things i'm still not fully clear about.
In the following basic code below just used for an example, when the compiler encounters myFunc(), where will str1 and str2 get stored?
They are pointers to string literals so I assume the string literal will get stored in read only memory, but what is the difference in this case between one pointer being static local and the other one not? Also, I thought local variables will get stored on the stack and they are not allocated until the function is called? This is confusing.
In the case of the integers, var1, it's non-static, but var2 is static. Will the compiler place this var2 in the data segment at compilation time. I've read on another post When do function-level static variables get allocated/initialized? , that local static variables will get created and initialsed the first time they are used and not during compilation. So in that case, what if the function is never called?
Thanks in advance for experienced knowledge.
EDITED: To call myFunc() from main(). It was a typo as myFunc() was never even called
int myFunc()
{
static char* str1 = "Hello";
char* str2 = "World";
int var1 = 1;
static int var2 = 8;
}
int main()
{
return myFunc();
}
Upvotes: 1
Views: 461
Reputation: 263337
What the compiler does must be based (assuming a correctly working compiler) on the semantics of the code, so that's what I'll discuss.
First, a fairly minor point. By declaring a function with ()
, you specify that it takes an fixed but unspecified number and type(s) of arguments. That's an obsolescent form of declaration/definition, and there's rarely if ever a good reason to use it. (Empty parentheses have a different meaning in C++, but you're asking about C.) To specify that a function has no parameters, use (void)
rather than ()
(especially for main
, since it's not 100% clear that int main()
must be accepted by a conforming compiler).
With that change:
int myFunc(void)
{
static char* str1 = "Hello";
char* str2 = "World";
int var1 = 1;
static int var2 = 8;
}
int main(void)
{
return myFunc();
}
This program does nothing; it produces no output, and has no side effects. A compiler is permitted to compile it down to nearly nothing. But let's ignore that and assume that nothing is discarded.
There are two important concepts to consider: scope and lifetime (also known as storage duration). The scope of an identifier is the region of program text in which it is visible. It's purely a compile-time concept. The lifetime of an object is the duration during execution in which that object exists. It's purely a run-time concept. The two are often confused, particularly when you use the words "local" and "global".
An object with automatic storage duration is created on entry to the block in which it's defined, and (logically) destroyed on exit from that block. In your program, the relevant block is enclosed by the {
and }
in the definition of myFunc()
.
An object with static storage duration exists during the entire run time of the program.
static char* str1 = "Hello";
"Hello"
is a string literal. It specifies a static array of type char[6]
; that array (at least logically) exists during the entire execution of the program. You are not allowed to modify the contents of that array -- but for historical reasons, it's not const
, and a compiler isn't required to warn you if you try to modify it. String literals are commonly stored in read-only memory (probably not physical ROM, but virtual memory that's marked as read-only).
The pointer object str1
also has static storage duration, though its name is visible only within the enclosing block ("block scope"). It's initialized to point to the initial character of "Hello"
. This initialization logically occurs before entry to main
. Since a string literal is effectively read-only, it would have been better to use const
to avoid the risk of accidentally trying to modify it:
static const char *str1 = "hello";
Next:
char* str2 = "World";
The name of the pointer object str2
has the same kind of block scope as str1
, but the pointer object itself has automatic storage duration. it is created on entry to the enclosing block and destroyed on exit. It's initialized to point to the initial character of "World"
; that initialization takes place when execution reaches the declaration. Again, it would be better to add a const
to the declaration.
int var1 = 1;
static int var2 = 8;
var
has block scope and automatic storage duration. It's initialized to 1
when its declaration is reached at run time. var2
has block scope and static storage duration. The object exists for the entire execution of the program, and it's initialized to 8
before entry to main()
.
Now we run into a bit of a problem. You've defined myFunc()
to return an int
result, but you don't actually return anything. As it happens, this isn't invalid by itself, but if the result is used by a caller (as it is by your main()
function), the behavior is undefined. The fix is simple: add a return 0;
before the closing }
.
Assuming you've added that, main
calls myFunc
. During execution of myFunc
, str2
and var1
are allocated somehow and are initialized as I've described. (Nothing happens to str1
or var2
because they're static
.) On return from the function, the storage allocated for str2
and var1
is released, effectively destroying the objects.
But the question you asked was: What will the compiler do? And the answer to that is: It will generate whatever code is necessary to implement the semantics I've just described. That's really all the C standard requires.
In practice, most compilers generate code that allocates variables with automatic storage duration on the "stack". The "stack" is usually a contiguous region of memory, starting from some fixed base address, that grows in one direction as items are added to it and shrinks in the other direction as items are removed. It's typically managed via a CPU register, the "stack pointer". (Some CPUs also have a "frame pointer".) But in fact all that the C standard requires is that such objects are allocated and deallocated in a first-in last-out manner -- and the actual allocation and deallocation needn't take place when you'd expect, as long as the resulting behavior is the same. For example, if you define a local object inside a loop, it might be allocated and deallocated on each iteration, or its allocation might be folded into the surrounding scope. The C standard doesn't care (and, in most cases, neither should you). There are even some compilers that don't use a contiguous stack at all; rather the storage for each function call is allocated from a heap. A contiguous stack is the best solution 90+% of the time, but it's not required.
Objects with static storage duration are typically allocated on program startup, before main
is called. Most systems store the initial contents of any initialized static objects in the executable file, so it can be loaded into memory. (That's likely to include string literals.) For static objects whose initial value is zero, the executable might just contain information about how much zeroed memory to allocate.
As for the generated instructions that operate on this data, that is entirely dependent on the CPU being targeted, and probably on the system ABI.
Upvotes: 2
Reputation: 752
EDIT:
The other answer and comments are correct - as is, your variables will be optimized out because they aren't even used. But let's have a little fun and actually use them to see what happens.
I compiled the op's program as-is with gcc -S trial.c
, and although myFunc was never called, nothing else about this answer changes.
I've slightly modified your program to actually use those variables so we can learn a little more about what the compiler and linker will do. Here it is:
#include <stdio.h>
int myFunc()
{
static const char* str1 = "Hello";
const char* str2 = "World";
int var1 = 1;
static int var2 = 8;
printf("%s %s %d %d\n", str1, str2, var1, var2);
return 0;
}
int main()
{
return myFunc();
}
I compiled with gcc -S trial.c
and got the following assembly file:
.file "trial.c"
.section .rdata,"dr"
.LC0:
.ascii "World\0"
.LC1:
.ascii "%s %s %d %d\12\0"
.text
.globl myFunc
.def myFunc; .scl 2; .type 32; .endef
.seh_proc myFunc
myFunc:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $64, %rsp
.seh_stackalloc 64
.seh_endprologue
leaq .LC0(%rip), %rax
movq %rax, -8(%rbp)
movl $1, -12(%rbp)
movl var2.3086(%rip), %edx
movq str1.3083(%rip), %rax
movl -12(%rbp), %r8d
movq -8(%rbp), %rcx
movl %edx, 32(%rsp)
movl %r8d, %r9d
movq %rcx, %r8
movq %rax, %rdx
leaq .LC1(%rip), %rcx
call printf
movl $0, %eax
addq $64, %rsp
popq %rbp
ret
.seh_endproc
.def __main; .scl 2; .type 32; .endef
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
call __main
call myFunc
addq $32, %rsp
popq %rbp
ret
.seh_endproc
.data
.align 4
var2.3086:
.long 8
.section .rdata,"dr"
.LC2:
.ascii "Hello\0"
.data
.align 8
str1.3083:
.quad .LC2
.ident "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
.def printf; .scl 2; .type 32; .endef
var1 isn't even found in the assembly file. It's actually just a constant that gets loaded onto the stack.
At the top of the assembly file, we see "World" (str2) in the .rdata section. Lower down in the assembly file, the string "Hello" is in the .rdata section, but the label for str1 (which contains the label, or address, for "Hello") is in the .data section. var2 is also in the .data section.
Here's a stackoverflow question that delves a little deeper into why this happens.
Another stackoverflow question points out that the .rdata section is the read-only section of .data and explains the different sections.
Hope this helps.
EDIT:
I decided to try this with the -O3 compiler flag (high optimizations). Here's the assembly file that I got:
.file "trial.c"
.section .rdata,"dr"
.LC0:
.ascii "World\0"
.LC1:
.ascii "Hello\0"
.LC2:
.ascii "%s %s %d %d\12\0"
.section .text.unlikely,"x"
.LCOLDB3:
.text
.LHOTB3:
.p2align 4,,15
.globl myFunc
.def myFunc; .scl 2; .type 32; .endef
.seh_proc myFunc
myFunc:
subq $56, %rsp
.seh_stackalloc 56
.seh_endprologue
leaq .LC0(%rip), %r8
leaq .LC1(%rip), %rdx
leaq .LC2(%rip), %rcx
movl $8, 32(%rsp)
movl $1, %r9d
call printf
nop
addq $56, %rsp
ret
.seh_endproc
.section .text.unlikely,"x"
.LCOLDE3:
.text
.LHOTE3:
.def __main; .scl 2; .type 32; .endef
.section .text.unlikely,"x"
.LCOLDB4:
.section .text.startup,"x"
.LHOTB4:
.p2align 4,,15
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
subq $40, %rsp
.seh_stackalloc 40
.seh_endprologue
call __main
xorl %eax, %eax
addq $40, %rsp
ret
.seh_endproc
.section .text.unlikely,"x"
.LCOLDE4:
.section .text.startup,"x"
.LHOTE4:
.ident "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
.def printf; .scl 2; .type 32; .endef
var1 is now just a constant 1 that is placed in a register (r9d). var2 is also just a constant, but it's placed on the stack. Also, the strings "Hello" and "World" are accessed in a more direct (efficient) way.
So, I decided that I wanted to try something slightly different:
#include <stdio.h>
void myFunc()
{
static const char* str1 = "Hello";
const char* str2 = "World";
int var1 = 1;
static int var2 = 8;
printf("%s %s %d %d\n", str1, str2, var1, var2);
var1++;
var2++;
printf("%d %d", var1, var2);
}
int main()
{
myFunc();
myFunc();
return 0;
}
And the associated assembly using gcc -O3 -S trial.c
.file "trial.c"
.section .rdata,"dr"
.LC0:
.ascii "World\0"
.LC1:
.ascii "Hello\0"
.LC2:
.ascii "%s %s %d %d\12\0"
.LC3:
.ascii "%d %d\0"
.section .text.unlikely,"x"
.LCOLDB4:
.text
.LHOTB4:
.p2align 4,,15
.globl myFunc
.def myFunc; .scl 2; .type 32; .endef
.seh_proc myFunc
myFunc:
subq $56, %rsp
.seh_stackalloc 56
.seh_endprologue
movl var2.3086(%rip), %eax
leaq .LC0(%rip), %r8
leaq .LC1(%rip), %rdx
leaq .LC2(%rip), %rcx
movl $1, %r9d
movl %eax, 32(%rsp)
call printf
movl var2.3086(%rip), %eax
leaq .LC3(%rip), %rcx
movl $2, %edx
leal 1(%rax), %r8d
movl %r8d, var2.3086(%rip)
addq $56, %rsp
jmp printf
.seh_endproc
.section .text.unlikely,"x"
.LCOLDE4:
.text
.LHOTE4:
.def __main; .scl 2; .type 32; .endef
.section .text.unlikely,"x"
.LCOLDB5:
.section .text.startup,"x"
.LHOTB5:
.p2align 4,,15
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
subq $40, %rsp
.seh_stackalloc 40
.seh_endprologue
call __main
call myFunc
call myFunc
xorl %eax, %eax
addq $40, %rsp
ret
.seh_endproc
.section .text.unlikely,"x"
.LCOLDE5:
.section .text.startup,"x"
.LHOTE5:
.data
.align 4
var2.3086:
.long 8
.ident "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
.def printf; .scl 2; .type 32; .endef
This is looking a little more like the original. var1 is still optimized to just constants, but var2 is now in the .data section again. "Hello" and "World" are still in the .rdata section because they are constant.
One of the comments points out that this would be different on different platforms with different compilers. I encourage you to try it out.
Upvotes: 11
Reputation: 1775
static
variables, even those within a function scope, will get stored at global scope. The static
variables within a function or scope will get initialized only the first time that function or scope is entered. Non-static
variables will get allocated or stored on the stack in most compilers when function scope is entered and initialized when scope is entered. Some compilers store local variables elsewhere.
Upvotes: 0
Reputation: 36401
Your code cannot be compiled without at least a warning as the function never returns anything which contradicts the return type specification.
Anyway on my machine it generate code. If you don't use any optimization code is emitted for the function to allocate the local str2
. str1
and var2
are allocated in the data section of the code to point to the respective values. If you use optimization obviously a stupid code is emitted and unsued local variable disappeared as unused globals.
To observe this you can at least examine the object code with nm
:
$ gcc -o p p.c
$ nm p
0000000100000f90 T _main
0000000100000f70 T _myFunc
0000000100001000 d _myFunc.str1
0000000100001008 d _myFunc.var2
$ gcc -O3 -o p2 p.c
$ nm p2
0000000100000fb0 T _main
0000000100000fa0 T _myFunc
If you want more details, then generate assembler code with -S
and observe what happens.
Upvotes: 1
Reputation: 1243
static const char* str1 = "Hello";
str1
is a static local pointer to a string literal which will be stored in read-only memory.
const char* str2 = "World";
str2
is a local, "stack-allocated" pointer to a string literal which will be stored in read-only memory.
The values of str1
and str2
are the respective addresses of the string literals they point to.
int var1 = 1;
static int var2 = 8;
If these lines of code are never reached, var2
will never be initialized. I don't know if the compiler sets aside a block of memory for it somewhere else at compiletime or not.
Upvotes: 3
Reputation: 385194
The compiler will produce a program that takes no input, does nothing, then emits no output.
All of those declarations are completely irrelevant as they do not contribute anything to the [non-existent] result of the program. You might say they "get optimised out", though the reality is that they literally have no analogue in your resulting compiled executable.
Upvotes: 0