dexter
dexter

Reputation: 195

What does the following excerpt from 'Modern C' by Jens Gustedt mean?

This is my first C programming book, prior to which I have taken some online courses on the language. It's been a smooth read until the following came up

Binary representation and the abstract state machine.

Unfortunately, the variety of computer platforms is not such that the C standard can completely impose the results of the operations on a given type. Things that are not completely specified as such by the standard are, for example, how the sign of a signed type is represented the (sign representation), and the precision to which a double floating-point operation is performed (floating-point representation). C only imposes properties on representations such that the results of operations can be deduced a priori from two different sources:

  • The values of the operands
  • Some characteristic values that describe the particular platform

For example, the operations on the type size_t can be entirely determined when inspecting the value of SIZE_MAX in addition to the operands. We call the model to represent values of a given type on a given platform the binary representation of the type.

Takeaway - A type’s binary representation determines the results of all operations.

Generally, all information we need to determine that model is within reach of any C program: the C library headers provide the necessary information through named values (such as SIZE_MAX), operators, and function calls.

Takeaway - A type’s binary representation is observable."

(Chapter 5, page 52-53)

Would someone explain it for me?

Upvotes: 1

Views: 313

Answers (1)

Lundin
Lundin

Reputation: 213892

the abstract state machine

The abstract machine is a term used by the formal C standard to describe the core about how a C program is supposed to behave, particularly in terms of code generation, order of execution and optimizations. It's a somewhat advanced topic so if you are a beginner, I'd advise to just ignore it for now. Otherwise, I wrote a brief explanation here.

Things that are not completely specified as such by the standard are, for example, how the sign of a signed type is represented the (sign representation), and the precision to which a double floating-point operation is performed (floating-point representation).

This refers to integers and floats having different sizes, different signedness formats, different endianess and so on depending on system. Meaning that those types are usually not portable.

C only imposes properties on representations such that the results of operations can be deduced a priori from two different sources:

  • The values of the operands
  • Some characteristic values that describe the particular platform

This is very broad, it doesn't mean much, basically just that in some cases the outcome of using an operator is well-defined by the language and in some cases it is not.

For example, the operations on the type size_t can be entirely determined when inspecting the value of SIZE_MAX in addition to the operands. We call the model to represent values of a given type on a given platform the binary representation of the type.

Generally, all information we need to determine that model is within reach of any C program: the C library headers provide the necessary information through named values (such as SIZE_MAX), operators, and function calls.

This probably means that for example we can check if an operator applied to operands of size_t will give the expected result or not:

size_t a, b = ...; // 
if(SIZE_MAX - a >= b)
{
  size_t c = a + b; // addition will not be larger than SIZE_MAX
}
else
{
  // a + b will not fit, handle error
}

But also that unsigned numbers have a well-defined wrap-around (unlike signed), so we can write code such as if(a+b < a) to check if an operation resulted in wrap-around. This behavior is well-defined and portable for unsigned numbers, but undefined behavior for signed numbers that may overflow.

In theory, any integer type in C (that isn't a character type) can contain exotic stuff such as padding bits. This is mainly there because C supports exotic signedness formats like one's complement and signed magnitude, but also because an integer type doesn't necessarily have to use all allocated bits. This is all highly theoretical language-lawyer stuff though and of no concern for beginners. In the real world, integers are almost certainly without padding bits and signed integers are almost certainly 2's complement.

Another related advanced topic: C has mechanisms to look at the raw binary representation of any variable, byte by byte. We can do so by casting the variable's address to a character pointer, then de-reference that pointer to get the raw binary representation. This isn't always possible in higher level languages - C is closer to the hardware than most languages.

Upvotes: 2

Related Questions