Aquarius_Girl
Aquarius_Girl

Reputation: 22906

Where are expressions and constants stored if not in memory?

From C Programming Language by Brian W. Kernighan

& operator only applies to objects in memory: variables and array elements. It cannot be applied to expressions, constants or register variables.

Where are expressions and constants stored if not in memory? What does that quote mean?

E.g:
&(2 + 3)

Why can't we take its address? Where is it stored?
Will the answer be same for C++ also since C has been its parent?

This linked question explains that such expressions are rvalue objects and all rvalue objects do not have addresses.

My question is where are these expressions stored such that their addresses can't be retrieved?

Upvotes: 47

Views: 5424

Answers (5)

aaaaaa123456789
aaaaaa123456789

Reputation: 5842

Consider the following function:

unsigned sum_evens (unsigned number) {
  number &= ~1; // ~1 = 0xfffffffe (32-bit CPU)
  unsigned result = 0;
  while (number) {
    result += number;
    number -= 2;
  }
  return result;
}

Now, let's play the compiler game and try to compile this by hand. I'm going to assume you're using x86 because that's what most desktop computers use. (x86 is the instruction set for Intel compatible CPUs.)

Let's go through a simple (unoptimized) version of how this routine could look like when compiled:

sum_evens:
  and edi, 0xfffffffe ;edi is where the first argument goes
  xor eax, eax ;set register eax to 0
  cmp edi, 0 ;compare number to 0
  jz .done ;if edi = 0, jump to .done
.loop:
  add eax, edi ;eax = eax + edi
  sub edi, 2 ;edi = edi - 2
  jnz .loop ;if edi != 0, go back to .loop
.done:
  ret ;return (value in eax is returned to caller)

Now, as you can see, the constants in the code (0, 2, 1) actually show up as part of the CPU instructions! In fact, 1 doesn't show up at all; the compiler (in this case, just me) already calculates ~1 and uses the result in the code.

While you can take the address of a CPU instruction, it often makes no sense to take the address of a part of it (in x86 you sometimes can, but in many other CPUs you simply cannot do this at all), and code addresses are fundamentally different from data addresses (which is why you cannot treat a function pointer (a code address) as a regular pointer (a data address)). In some CPU architectures, code addresses and data addresses are completely incompatible (although this is not the case of x86 in the way most modern OSes use it).

Do notice that while (number) is equivalent to while (number != 0). That 0 doesn't show up in the compiled code at all! It's implied by the jnz instruction (jump if not zero). This is another reason why you cannot take the address of that 0 — it doesn't have one, it's literally nowhere.

I hope this makes it clearer for you.

Upvotes: 63

Useless
Useless

Reputation: 67723

where are these expressions stored such that there addresses can't be retrieved?

Your question is not well-formed.

  • Conceptually

    It's like asking why people can discuss ownership of nouns but not verbs. Nouns refer to things that may (potentially) be owned, and verbs refer to actions that are performed. You can't own an action or perform a thing.

  • In terms of language specification

    Expressions are not stored in the first place, they are evaluated. They may be evaluated by the compiler, at compile time, or they may be evaluated by the processor, at run time.

  • In terms of language implementation

    Consider the statement

    int a = 0;
    

    This does two things: first, it declares an integer variable a. This is defined to be something whose address you can take. It's up to the compiler to do whatever makes sense on a given platform, to allow you to take the address of a.

    Secondly, it sets that variable's value to zero. This does not mean an integer with value zero exists somewhere in your compiled program. It might commonly be implemented as

    xor eax,eax
    

    which is to say, XOR (exclusive-or) the eax register with itself. This always results in zero, whatever was there before. However, there is no fixed object of value 0 in the compiled code to match the integer literal 0 you wrote in the source.

As an aside, when I say that a above is something whose address you can take - it's worth pointing out that it may not really have an address unless you take it. For example, the eax register used in that example doesn't have an address. If the compiler can prove the program is still correct, a can live its whole life in that register and never exist in main memory. Conversely, if you use the expression &a somewhere, the compiler will take care to create some addressable space to store a's value in.


Note for comparison that I can easily choose a different language where I can take the address of an expression.

It'll probably be interpreted, because compilation usually discards these structures once the machine-executable output replaces them. For example Python has runtime introspection and code objects.

Or I can start from LISP and extend it to provide some kind of addressof operation on S-expressions.

The key thing they both have in common is that they are not C, which as a matter of design and definition does not provide those mechanisms.

Upvotes: 42

Where are expressions and constants stored if not in memory

In some (actually many) cases, a constant expression is not stored at all. In particular, think about optimizing compilers, and see CppCon 2017: Matt Godbolt's talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”

In your particular case of some C code having 2 + 3, most optimizing compilers would have constant folded that into 5, and that 5 constant might be just inside some machine code instruction (as some bitfield) of your code segment and not even have a well defined memory location. If that constant 5 was a loop limit, some compilers could have done loop unrolling, and that constant won't appear anymore in the binary code.

See also this answer, etc...

Be aware that C11 is a specification written in English. Read its n1570 standard. Read also the much bigger specification of C++11 (or later).

Taking the address of a constant is forbidden by the semantics of C (and of C++).

Upvotes: 4

klutt
klutt

Reputation: 31306

It does not really make sense to take the address to an expression. The closest thing you can do is a function pointer. Expressions are not stored in the same sense as variables and objects.

Expressions are stored in the actual machine code. Of course you could find the address where the expression is evaluated, but it just don't make sense to do it.

Read a bit about assembly. Expressions are stored in the text segment, while variables are stored in other segments, such as data or stack.

https://en.wikipedia.org/wiki/Data_segment

Another way to explain it is that expressions are cpu instructions, while variables are pure data.

One more thing to consider: The compiler often optimizes away things. Consider this code:

int x=0;
while(x<10)
    x+=1;

This code will probobly be optimized to:

int x=10;

So what would the address to (x+=1) mean in this case? It is not even present in the machine code, so it has - by definition - no address at all.

Upvotes: 5

Lundin
Lundin

Reputation: 213693

Such expressions end up part of the machine code. An expression 2 + 3 likely gets translated to the machine code instruction "load 5 into register A". CPU registers don't have addresses.

Upvotes: 10

Related Questions