Laurent
Laurent

Reputation: 589

What is the relationship between memory representation and value of a variable in C?

In C, it's true that:

[8-bit] signed char: -127 to 127
[8-bit] unsigned char: 0 to 255

But what does really happen in memory? Is a signed char represented in two's complement and a unsigned char represented without any specific representation (that is, a sequence of 11111111)?

How does the executable keep track of the variable type it's reading, to figure out whether the value in the CPU register is to be interpreted as two's complement or not? Is there some metadata that associates a variable name with its type?

Thanks!

Upvotes: 0

Views: 136

Answers (3)

dtech
dtech

Reputation: 49279

C is a strongly typed language. The interpretation of memory is entirely defined by the context. That is, the type is (sufficiently well in the case of dynamic dispatch) known at compile time and the compiler makes all the decisions in advance. For the sake of performance, runtime checks are reduced to the bare minimum (in C to none unless you implement dynamic dispatch or RTTI manually).

In C (and C++) you can easily interpret the same memory location in different ways, all you have to do is acquire a pointer to it and cast it to a different type. Very unsafe if you don't know what you are doing.

Upvotes: 2

Fawzan
Fawzan

Reputation: 4849

The internal representation of numbers is not part of C language, it's a feature of the architecture of the machine itself. Most implementations use 2's complement because it makes addition and subtraction the same binary operation (signed and unsigned operations are identical).

FYI Almost all existing CPU hardware uses two's complement, so it makes sense that most programming languages do, too.

Upvotes: 0

HelloWorld
HelloWorld

Reputation: 1863

There is no meta data. The final execution is done by the underlying hardware because the compiler uses different instructions when doing some operations on these types. It becomes more obvious when you compare the assembly.

void test1()
{
  char p = 0;
  p += 3;
}

void test2()
{
  unsigned char p = 0;
  p += 3;
}

What you see here are the instructions compiled by the compiler form the source posted above. Compiled with no optimization -O0 this is the created assembly of clang 3.7. You can ignore most of the instructions, if you are not familiar with them. Keep the focus on movsx and movzx. These two instructions make the difference how the memory location is treated.

test1():                              # Instructions for test1
    push    rbp
    mov rbp, rsp
    mov byte ptr [rbp - 1], 0
    movsx   eax, byte ptr [rbp - 1]   <-- Move byte to word with sign-extension
    add eax, 3
    mov cl, al
    mov byte ptr [rbp - 1], cl
    pop rbp
    ret

test2():                              # Instructions for test2
    push    rbp
    mov rbp, rsp
    mov byte ptr [rbp - 1], 0
    movzx   eax, byte ptr [rbp - 1]   <-- Move byte to word with zero-extension
    add eax, 3
    mov cl, al
    mov byte ptr [rbp - 1], cl
    pop rbp
    ret

Upvotes: 5

Related Questions