Reputation: 22906

Where are expressions and constants stored if not in memory?

From C Programming Language by Brian W. Kernighan

& operator only applies to objects in memory: variables and array elements. It cannot be applied to expressions, constants or register variables.

Where are expressions and constants stored if not in memory? What does that quote mean?

E.g:
&(2 + 3)

Why can't we take its address? Where is it stored?
Will the answer be same for C++ also since C has been its parent?

This linked question explains that such expressions are rvalue objects and all rvalue objects do not have addresses.

My question is where are these expressions stored such that their addresses can't be retrieved?

Upvotes: 47

Answers (5)

aaaaaa123456789

Reputation: 5842

Consider the following function:

unsigned sum_evens (unsigned number) {
  number &= ~1; // ~1 = 0xfffffffe (32-bit CPU)
  unsigned result = 0;
  while (number) {
    result += number;
    number -= 2;
  }
  return result;
}

Now, let's play the compiler game and try to compile this by hand. I'm going to assume you're using x86 because that's what most desktop computers use. (x86 is the instruction set for Intel compatible CPUs.)

Let's go through a simple (unoptimized) version of how this routine could look like when compiled:

sum_evens:
  and edi, 0xfffffffe ;edi is where the first argument goes
  xor eax, eax ;set register eax to 0
  cmp edi, 0 ;compare number to 0
  jz .done ;if edi = 0, jump to .done
.loop:
  add eax, edi ;eax = eax + edi
  sub edi, 2 ;edi = edi - 2
  jnz .loop ;if edi != 0, go back to .loop
.done:
  ret ;return (value in eax is returned to caller)

Now, as you can see, the constants in the code (0, 2, 1) actually show up as part of the CPU instructions! In fact, 1 doesn't show up at all; the compiler (in this case, just me) already calculates ~1 and uses the result in the code.

While you can take the address of a CPU instruction, it often makes no sense to take the address of a part of it (in x86 you sometimes can, but in many other CPUs you simply cannot do this at all), and code addresses are fundamentally different from data addresses (which is why you cannot treat a function pointer (a code address) as a regular pointer (a data address)). In some CPU architectures, code addresses and data addresses are completely incompatible (although this is not the case of x86 in the way most modern OSes use it).

Do notice that while (number) is equivalent to while (number != 0). That 0 doesn't show up in the compiled code at all! It's implied by the jnz instruction (jump if not zero). This is another reason why you cannot take the address of that 0 — it doesn't have one, it's literally nowhere.

I hope this makes it clearer for you.

Upvotes: 63

Useless

Reputation: 67723

where are these expressions stored such that there addresses can't be retrieved?

Your question is not well-formed.

Conceptually

It's like asking why people can discuss ownership of nouns but not verbs. Nouns refer to things that may (potentially) be owned, and verbs refer to actions that are performed. You can't own an action or perform a thing.
In terms of language specification

Expressions are not stored in the first place, they are evaluated. They may be evaluated by the compiler, at compile time, or they may be evaluated by the processor, at run time.
In terms of language implementation

Consider the statement
```
int a = 0;
```
This does two things: first, it declares an integer variable a. This is defined to be something whose address you can take. It's up to the compiler to do whatever makes sense on a given platform, to allow you to take the address of a.

Secondly, it sets that variable's value to zero. This does not mean an integer with value zero exists somewhere in your compiled program. It might commonly be implemented as
```
xor eax,eax
```
which is to say, XOR (exclusive-or) the eax register with itself. This always results in zero, whatever was there before. However, there is no fixed object of value 0 in the compiled code to match the integer literal 0 you wrote in the source.

As an aside, when I say that a above is something whose address you can take - it's worth pointing out that it may not really have an address unless you take it. For example, the eax register used in that example doesn't have an address. If the compiler can prove the program is still correct, a can live its whole life in that register and never exist in main memory. Conversely, if you use the expression &a somewhere, the compiler will take care to create some addressable space to store a's value in.

Note for comparison that I can easily choose a different language where I can take the address of an expression.

It'll probably be interpreted, because compilation usually discards these structures once the machine-executable output replaces them. For example Python has runtime introspection and code objects.

Or I can start from LISP and extend it to provide some kind of addressof operation on S-expressions.

The key thing they both have in common is that they are not C, which as a matter of design and definition does not provide those mechanisms.

Upvotes: 42

Basile Starynkevitch

Reputation: 1

Where are expressions and constants stored if not in memory

In some (actually many) cases, a constant expression is not stored at all. In particular, think about optimizing compilers, and see CppCon 2017: Matt Godbolt's talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”

In your particular case of some C code having 2 + 3, most optimizing compilers would have constant folded that into 5, and that 5 constant might be just inside some machine code instruction (as some bitfield) of your code segment and not even have a well defined memory location. If that constant 5 was a loop limit, some compilers could have done loop unrolling, and that constant won't appear anymore in the binary code.

Where are expressions and constants stored if not in memory?

Answers (5)

Conceptually

In terms of language specification

In terms of language implementation

Related Questions