Reputation: 75

Questions about the Memory Layout of a C program

I have some questions about memory layout of C programs.

Text Segment

Here is my first question:

When I searched the text segment (or code segment) I read that "Text segment contain executable instructions". ut what are executable instructions for any function? Could you give some different examples?

I also read that "Text segment is sharable so that only a single copy needs to be in memory for frequently executed programs such as text editors, the C compiler, etc.", but I couldn't make a connection between C programs and "text editors".

What should I understand from this statement?

Initialized Data Segment

It is said that the "Initialized Data Segment" contains the global variables and static variables, but I also read that const char* string = "hello world" makes the string literal "hello world" to be stored in initialized read-only area and the character pointer variable string in initialized read-write area. char* string is stored read-only area or read-write area? Since both are written here I'm a bit confused.

Stack

From what I understand, the stack contains the local variables. Is this right?

Upvotes: 2

Answers (4)

Krishan

Reputation: 337

However, the actual layout of a program's in-memory image is left entirely up to the operating system, and often the program itself as well. Yet, conceptually we can think of two segments of memory for a running program[1].

Text or Code Segment - Contains compiled program code.
Data Segment - Contains data (global, static, and local) both initialized and uninitialized. Data segment can further be sub-categorized as follows:

2.1 Initialized Data Segments
2.2 Uninitialized Data Segments
2.3 Stack Segment
2.4 Heap Segment

Initialized data segment stores all global, static, constant, and external variables (declared with extern keyword) that are initialized beforehand.

Uninitialized data segment or .bss segment stores all uninitialized global, static, and external variables (declared with extern keyword).

Stack segment is used to store all local variables and is used for passing arguments to the functions along with the return address of the instruction which is to be executed after the function call is over.

Heap segment is also part of RAM where dynamically allocated variables are stored.

Coming to your first question - If you are aware of function pointers then you know that the function name returns the address of the function (which is the entry point for that function). These instructions are coded in assembly. Instruction set may vary from architecture to architecture.

Text or code section is shareable - If more than one running process belong to the same program then the common compiled code need not be loaded into memory separately. For example if you have opened two .doc documents then there will be two processes for them but definitely there will be some common code being used by both processes.

Upvotes: 1

Vishal Chovatiya

Reputation: 99

The stack segment is area where local variables are stored. By saying local variable means that all those variables which are declared in every function including main( ) in your C program.

When we call any function, stack frame is created and when function returns, stack frame is destroyed including all local variables of that particular function.

Stack frame contain some data like return address, arguments passed to it, local variables, and any other information needed by the invoked function.

A “stack pointer (SP)” keeps track of stack by each push & pop operation onto it, by adjusted stack pointer to next or previous address.

you can refer this link for practical info:- http://www.firmcodes.com/memory-layout-c-program-2/

Upvotes: 0

slugonamission

Reputation: 9642

The text segment contains the actual code of your program, i.e. the machine code emitted by your compiler. The idea of the last statement is that your C program and, say, a text editor is exactly the same thing; it's just machine code instructions executing from memory.

For example, we'll take the following code, and a hypothetical architecture I've just thought up now because I can't remember x86 assembly.

while(i != 10)
{
    x -= 5;
    i++;
}

This would translate to the following instructions

LOOP_START:
CMP eax, 10    # EAX contains i. Is it 10?
JZ  LOOP_END   # If it's 10, exit the loop
SUB ebx, 5     # Otherwise, subtract 5 from EBX (x)
ADD eax, 1     # And add 1 to i
JMP LOOP_START # And then go to the top of the loop.

LOOP_END:
# Do something else

These are low-level operations that your processor can understand. These would then be translated into binary machine code, which is then stored in memory. The actual data stored might be 5, 2, 7, 6, 4, 9, for example, given a mapping between operation and opcode that I just thought up. For more information on how this actually happens, look up the relationship between assembler and machine code.

-- Ninja-edit - if you take RBK's comment above, you can view the actual instructions which make up your application using objdump or a similar disassembler. There's one in Visual Studio somewhere, or you could use OllyDbg or IDA on Windows.

Because the actual instructions of your program should be read-only, the text segment doesn't need to be replicated for multiple runs of your program since it should always be the same.

As for your question on the data segment, char* string will actually be stored in the .bss segment, since it doesn't have an initializer. This is an area of memory that is cleared before your program runs (by the crt0 or equivalent) unless you give GCC a flag that I can't remember off-hand. The .bss segment is read-write.

Yes, the stack segment contains your local variables. In reality, it stores what are called "stack frames". One of these is created for each function you call, and they stack on top of each other. It contains stuff like the local variables, as you said, and other useful bits like the address that the function was called from, and other useful data so that when the function exits, the previous state can be reinstated. For what is actually contained on a stack frame, you need to delve into your architecture's ABI (Application Binary Interface).

Upvotes: 3

Mats Petersson

Reputation: 129374

The text segment is often also called "code" ("text" tends to be the Unix/linux name, other OS's doesn't necessarily use that name).

And it is shareable in the sense that if you run TWO processes that both execute the C-compiler, or you open the text editor in two different windows, both of those share the same "text" section - because it doesn't change during the running of the code (self-modifying code is not allowed in text-segment).

Initialized string value is stored in either "ro-data" or "text", depending on the compiler. And yes, it's not writeable.

If string is a global variable, it will end up in "initialized data", which will hold the address of the "hello world" message in the value of string. The const part is referring to the fact that the contents the pointer points at is constant, so we can actually change the pointer by string = "foo bar"; later in the code.

The stack is, indeed, used for local variables and, typically, the call stack (where the code returns to after it finishes the current function).