Mohamed Adel Anis
Mohamed Adel Anis

Reputation: 11

compilation process and variables' addresses

I have been asked this question in technical interview "what are the compilation process in c ?"

I answered:

  1. preprocessor
  2. compiler
  3. assembler
  4. linker

then he continued

"After which one of these compilation process all the variables in the program are located and have addresses.... that if there are 2 variables A and B .... after which process A and B are going to have address in the memory"

(I think he meant which produced file after each process)

I finally answered that it is after the linker as extern values need to be defined but I have no clue if what I said was right or wrong.

So hopefully, there is someone can help me to understand this question

Upvotes: 0

Views: 569

Answers (2)

old_timer
old_timer

Reputation: 71566

There is no one answer to the address question. And depending on the platform your variable may have more than one address.

When you compile depending on the variable it has either been allocated an offset to the stack pointer on the stack, but the stack pointer is not known until runtime of that function (usually). For .data and .bss then the compiler leaves a mechanism depending on the compiler and target as to how to reach the variables.

unsigned int x = 5;
unsigned int y;
unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int z )
{
    unsigned int a;
    a = x + 1;
    a = a + more_fun(y) + y + z;
    return(a);   
}

00000000 <fun>:
   0:   e92d4070    push    {r4, r5, r6, lr}
   4:   e1a06000    mov r6, r0
   8:   e59f3028    ldr r3, [pc, #40]   ; 38 <fun+0x38>
   c:   e5934000    ldr r4, [r3]
  10:   e59f5024    ldr r5, [pc, #36]   ; 3c <fun+0x3c>
  14:   e5950000    ldr r0, [r5]
  18:   ebfffffe    bl  0 <more_fun>
  1c:   e5953000    ldr r3, [r5]
  20:   e0844003    add r4, r4, r3
  24:   e2844001    add r4, r4, #1
  28:   e0844006    add r4, r4, r6
  2c:   e0840000    add r0, r4, r0
  30:   e8bd4070    pop {r4, r5, r6, lr}
  34:   e12fff1e    bx  lr

In this case z is not stored on the stack but instead a register is saved on the stack and z is stored in that register, so it doesnt have an address, relative or otherwise. x and y do have addresses to be filled in later by the linker is how this compiler and target solve the problem. This is obviously optimized. a does not have an address either it is handled in a register. Had I not optimized then a and z would have stack pointer relative storage and the globals stay global.

once linked though.

00200008 <more_fun>:
  200008:   e12fff1e    bx  lr

0020000c <fun>:
  20000c:   e92d4070    push    {r4, r5, r6, lr}
  200010:   e1a06000    mov r6, r0
  200014:   e59f3028    ldr r3, [pc, #40]   ; 200044 <fun+0x38>
  200018:   e5934000    ldr r4, [r3]
  20001c:   e59f5024    ldr r5, [pc, #36]   ; 200048 <fun+0x3c>
  200020:   e5950000    ldr r0, [r5]
  200024:   ebfffff7    bl  200008 <more_fun>
  200028:   e5953000    ldr r3, [r5]
  20002c:   e0844003    add r4, r4, r3
  200030:   e2844001    add r4, r4, #1
  200034:   e0844006    add r4, r4, r6
  200038:   e0840000    add r0, r4, r0
  20003c:   e8bd4070    pop {r4, r5, r6, lr}
  200040:   e12fff1e    bx  lr
  200044:   0021004c    eoreq   r0, r1, r12, asr #32
  200048:   00210050    eoreq   r0, r1, r0, asr r0

Disassembly of section .data:

0021004c <x>:
  21004c:   00000005    andeq   r0, r0, r5

Disassembly of section .bss:

00210050 <y>:
  210050:   00000000    andeq   r0, r0, r0

x and y have known/fixed addresses. So when you see an answer or comment here saying link time that is what they are talking about. In this case the compiler didnt end up needing any stack based variables, those technically would be runtime, although with a trivial program and say only one call to the function, that could be pre-determined and/or would end up being fixed, essentially link time determined where they would end up, but dont assume that, assume that non-static locals are technically determined at run time.

Now had I built with -fPIC, the access to x and y would be a double indirect, there would be a read of the global offset table, then within that is the address to the variable itself. The initial addresses ARE determined at link time, but can be modified at load time to be somewhere else.

And then there is virtual vs physical, if you are running on an operating system lets say and that doesnt have to but likely uses an mmu to allow the program to think it is in some zero based memory space (program loads at offset say 0x8000 as far as the program and toolchain are concerned), but there is a physical address which can vary for each load, or even worse if the program is swapped out it could come back somewhere else so long as the virtual space is done right the physical can be different at load time or runtime if swapped out.

That is the problem when you see questions like this in an interview or a college test. Sometimes the person asking is looking for a specific answer like linker, which while true in a great number of situations, there are exceptions. Or perhaps the person asking knows more than just enough to be dangerous and is either looking for load time or link time or runtime or is looking for a longer explanation.

There are further exceptions to these answers discussed thus far. So it is likely that the person asking had a specific answer or reason for the question which it is very likely you are not able to read their mind and get it right. So it is an unfair/bad question, I would hesitate working for a place that asks such poor questions. Unless, it is the latter they know all the nuances and are trying to see if you know all the nuances for some reason. It could be a weed out question to see who stumbles and may have nothing to do with their product or development.

I recommend you get/build some cross compilers for a few of the different gnu supported targets (say pdp-11, not joking, arm, x86, and maybe another), try different experiments like the above or disassemble actual projects you are working on and see how the tool works. If given the freedom in the interview, you can say, let me show you, and get on a laptop and bang out a simple example, if THEY are not following YOU and are getting confused, say thank you and look for a different employer. At the same time we do all day interviews with several of us taking turns with the candidate one on one, and not uncommon when we are in the post to hear, I asked this question and this was their answer. And others in the room say "I dont even know what you are trying to ask there", so sometimes it is just a bad question.

I cant imagine what kind of job would really care about such a thing, why would this be a relevant interview question? Is this a toolchain developer?

EDIT

Short answer: there is more than one correct answer, and at the same time that means the answers can contradict each other.

Compile time stack pointer relative offsets for local items are determined. But the stack pointer itself and thus the offset is a runtime thing for that function.

Link time addresses are applied to the remaining items including variables. So link time is a correct answer.

It is possible to have load time changes made, position independent code for example, so load time is a correct answer.

And then there is of course virtual addresses vs physical, the physical addresses behind the mmu are at load time, and possible to change at run time.

Upvotes: -1

IceBerg0
IceBerg0

Reputation: 59

I just want to add some clarification to user3386109 comment:

  1. In case of a bare metal compiler the definitive address is assigned at link time.
  2. In the case of a program that is intended to run on an OS (linux, windows, RT-linux, ...) the linker assign a relocatable addresses and the definitive one will be given when the program loads. But I don't think the loading is really considered part of the compiling process, I would rather say it is part of the program initialization process.

Hope it helps.

Upvotes: 2

Related Questions