Reputation: 91630
As asked in "How does pointer incrementation work?", I have a follow-up question.
How does a pointer know the underlying size of the data it points to? Do pointers store a size of the underlying type so they can know how to increment?
I'd expect that the following code would move a pointer forward one byte:
int intarr[] = { ... };
int *intptr = intarr;
intptr = intptr + 1;
printf("intarr[1] = %d\n", *intptr);
According to the accepted answer on the linked site, having a pointer increment by bytes and not by the underlying sizeof
the pointed element would cause mass hysteria, confusion, and chaos.
While I understand that this would probably be an inevitable outcome, I still don't understand how pointers work in this regard. Couldn't I declare a void
pointer to some struct[]
type array, and if I did so, how would the void
pointer know to increment by sizeof(struct mytype)
?
Edit: I believe that I've worked most of the difficulties out that I'm having, but I'm not quite there as far as demonstrating it in code.
See here: http://codepad.org/0d8veP4K
#include <stdio.h>
int main(int argc, char *argv[])
{
int intarr[] = { 0, 5, 10 };
int *intptr = intarr;
// get the value where the pointer points
printf("intptr(%p): %d\n", intptr, *intptr);
printf("intptr(%p): %d\n", intptr + 1, *(intptr + 1));
printf("intptr(%p): %d\n", intptr + 2, *(intptr + 2));
// the difference between the pointer value should be same as sizeof(int)
printf("intptr[0]: %p | intptr[1]: %p | difference: %d | expected: %d",
intptr, intptr + 1, (intptr + 1) - intptr, sizeof(int));
return 0;
}
Upvotes: 4
Views: 985
Reputation: 120
in your example:
and in your program
Output:
intptr(0xffcbf5dc): 0
intptr(0xffcbf5e0): 5
intptr(0xffcbf5e4): 10
intptr[0]: 0xffcbf5dc | intptr[1]: 0xffcbf5e0 | difference: 1 | expected: 4
and if you try: 0xffcbf5e0 - 0xffcbf5dc = 4 (hex sub)and this is the sizeof(int).
Upvotes: 0
Reputation:
Do pointers store a size of the underlying type so they can know how to increment?
This question suggests that type information needs to be kept with the object at runtime to make correct decisions on how to perform the correct operations for the type. That's not true. Type information becomes part of the code.
It may be easier to understand if we add a third type into the mix: floating point.
Consider this sample program:
int a,b,c;
float x,y,z;
void f(void)
{
c = a+b*3;
z = x+y*3;
}
(I ask you to think about the float
vs. int
case first not because it's simpler but because it's more complex. The extra complexity prevents you from taking shortcuts that are tempting but wrong.)
The compiler must translate f
into some assembly code that performs two different kinds of addition and multiplication. Although the same operators (+
and *
) appear twice in the C code, the assembly code won't look so symmetric. The first half will use the processor's integer registers, integer addition instruction, and integer multiplication instruction, and the second half will use floating point registers, floating point addition, and floating point multiplication. Even the constant 3
will be represented differently in the two places it appears.
At the assembly level, the memory where a
, b
, c
, x
, y
, and z
are stored doesn't need to be tagged because the type information is implicit in the instructions that access that memory. The loads and stores of the integer registers will only be targeted at the memory locations holding a
, b
, and c
.
The C arithmetic operators are overloaded. When translating from a language with an overloaded operator to a language without a corresponding overloaded operator, the type information from the first language becomes part of the name of the operator in the second language. ("Name mangling" when translating from C++ to C is the same thing happening at another level. You could say that assembly language "ADD" (integer) and "FADD" (floating point) instructions are name-mangled +
operators.)
Now, about pointer arithmetic. Pointers are just another type to overload. If the expression a=a+1
can generate two different varieties of assembly code depending on whether a
is int
or float
, why not a third variety when a
is int *
, another when a
is struct tm *
, and so on?
In the C code, type information is contained in the variable declarations. In the compiler's intermediate representation, the type of every expression is known. In the compiler's output, the necessary pieces of type information are implicit in the machine instructions.
Upvotes: 4
Reputation:
Kind of a crude answer, but it's worth noting at the machine level that data types, as we know them in C, don't exist. We might have arithmetical instructions that operate on integers stored in some general-purpose register, e.g., but there's nothing stored to identify that the contents of some register is actually an int
. All the machine sees is a bunch of bits and bytes in various types of memory.
So you might even wonder how it's possible for a compiler to know how to do this:
int z = x + y;
How can it know to do an integer addition here if there's nothing stored when the program is running to identify that the memory regions storing the contents of x
and y
and z
are ints
?
And the short/crude answer is that the machine doesn't know once the program is running. Yet it had this information available when it generated the instructions that would be used to run the program.
It's the same case with pointers:
int intarr[] = { ... };
int *intptr = intarr;
Doing something like intptr + 1
here can be done to increment the pointer address by sizeof(int)
. The compiler knows to do this based on the information provided by you, the programmer, in this C code. If you did this instead:
int intarr[] = { ... };
void *voidptr = intarr;
... then trying to perform any arithmetic on voidptr
would result in an error, since we aren't giving the information necessary for the compiler to know what machine instructions to generate.
Couldn't I declare a void pointer to some struct[] type array, and if I did so, how would the void pointer know to increment by
sizeof(struct mytype)
?
It can't. The void pointer would equate to a loss of compile-time information that would prevent the compiler from being able to generate the appropriate instructions. If you don't provide the info, the compiler doesn't know how to do the pointer arithmetic. And this is why functions which accept a void pointer like memcpy
need a byte size to be specified. The pointee contents don't provide that kind of info, only the programmer can provide it since this kind of information is not stored in the memory used by the program at runtime.
Upvotes: 2
Reputation: 153348
It is in the type declaration. p1
knows the size of the type because it is sizeof(*p1)
or sizeof(int)
. p2
does not know as sizeof(void)
is not defined.
int *p1;
void *p2;
p1++; // OK
p2++; // Not defined behavior in C
Upvotes: 5