initialized data segment in C binaries running under Windows

Question

I am for a long time trying to get a picture how is program memory handled under OS (I use Windows, but I guess this will be the same or very close on Linux).

So far, I know (mostly thanks to you stackoverflow users) that local variables are stored on stack. Now I also finally understand why. So thats OK.

But what I still miss is, how are stored and handled global variables. And I want to know it on assembler basis. I have my idea of how these can be handled, but I cant be sure, becouse there are many things I still dont know about that could made my idea impossible to implement.

So, my idea is, that global variables are locate at the end of the program code. After the last instruction. Why I think it could be this way? Becouse than, you would not need to waste any extra memory and CPU time. Becouse that variables and their default values would by copied into RAM by OS when executed.

Why I thing this would be possible? Becouse, if I am not wrong, on modern x86 OSes, every program gets its own adress space starting from 0. This way, compiled very easily knows adress of the global variable. becouse it know the length of the program, so it can calculate its position in its adress space.

Why I think this might be all wrong? Becouse I already thought why are local variables created on stack instead this same way. And when you would have some routines in ELF format, you have precompiled routines just with unresolved adresses for variables.

Also, in some article, I read that allocating memory using malloc expands heap. And becouse I thing of heap as the space after the program code, there would be error becouse it would grow into the stack. Otherwise stack would need to be located at the end of process adress space, but taht would be terrible waste of memory.

I tried to describe my point of view as much as I could, so I hope you can understand where I made some mistakes, and help me to fill knowledge I am missing. Thanks.

Billy ONeal · Accepted Answer

The memory usage of a program is not a function of the language in which that program is written. You can write code that uses a ton of memory in C#, you can do the same in C.

That said, I will try to address some of what you're asking:

how are stored and handled global variables

They are given memory addresses. When you use a global, the compiler just uses that known address. (What, did you expect a complex answer?)

that global variables are locate at the end of the program code

On some architectures that may be the case, but it need not be. On Windows (using the Portable Executable file format) they are not in any way related, and may be mapped to completely different arbitrary locations. In fact, most likely this is not the case on more recent architectures, which discourage placing code where data lives (for security purposes -- you don't want a buffer overrun to be allowed to overwrite your program code)

if I am not wrong, on modern x86 OSes, every program gets its own adress space starting from 0

You are not wrong in theory, but you are wrong in fact. Even if linkers could really do this, few would, simply because 0 is used as the null constant. Usually however, the issue is that there are dynamic libraries, or other items that consume pieces of your process' address space, well before your code is actually loaded. (For example, a reference to the file where your code is located, or a block of memory which contains the command line passed to your program)

I read that allocating memory using malloc expands heap

Well, you're assuming that there is only one heap. On Windows at least, each DLL usually has it's own heap, and you can create heaps on the fly/at will. The way heaps are classically explained in Computer Science courses are usually assuming a system where there is no base operating system or virtual memory in play.

You are confusing memory with address space here for modern processors. Address locations of things often have very little to do with where they are physically stored. The Wikipedia article on Virtual Memory might make things make a bit more sense for you. Good luck!

EDIT:

the PE exe file actually has informations about global variables that can be distinguished from other data

Not exactly. The PE file format has a section where static data is stored, and that region of the file is memory mapped. The code knows where in that big chunk where the specific globals you're looking for are.

That Os actualy maps them to lets say "best" space available

Modern processors use a flat memory model. It's just as easy to access any one address as it is to access any other.

I always thought that compiled code is no further changed by OS at runtime

It isn't (well, for the most part, the reasons it might change are a whole can of worms in and of themselves). To access the global, the code itself needs to know the base address at which it is loaded. It can calculate where the data block of the PE file is loaded from that, for the most part. That said, compilers are free to put globals pretty much anywhere; the fact that the PE spec has a place for initialized data does not mean the compiler has to use it (MingGW, for example, I believe does not use that area).

First, does .exe contains information about stack size needed?

Yes, there are settings controlling both the reserved size of the stack as well as the committed size of the stack. Because Stack Overflows can be handled safely on Windows, usually the stack is only 1 MB; on *nix machines it's usually 8MB or more.

And is stack size limited?

Not as far as I am aware. That said; there are of course practical limitations. First and foremost, address space reserved for the stack will not be usable for anything other than the stack. There are also large portions of the address space that are reserved by the Kernel for various uses; not to mention the actual code and data on which your program operates. If you are using more than 1MB of stack, you should consider using a heap allocated stack for your data, and switch to an iterative solution, or seriously rethink how your program operates. 1MB of stack is much much more than is commonly used.

And second, is there any article you now about, that contains this informations, and/or informations what does PE format contauns are how is this coded to be seen when disassembling .exe file

You can read the PE Spec: http://www.microsoft.com/whdc/system/platform/firmware/pecoff.mspx

On modern systems, you can essentially forget about knowing exactly where code segments and data are located both physically and virtually. The Processor does not care or enforce anything like this, so there is no reason any operating system or program is forced to use any sort of memory organization. In particular, the concept of a Heap in Windows is much different than what is typically taught in Computer Science courses. In Windows (and other modern OSs), the heap is nothing more than a bunch of memory dolled out by the OS. The location, however, of that memory is completely variable. Ask for the OS for one block, you might get it at 0x00005556, and you might get the next block at 0xFFFF890. There's no reason for a distinction because the processor underneath simply does not care.

initialized data segment in C binaries running under Windows

Answers (2)

Related Questions