Reputation: 1692
By reading some details about pointers and arrays in C I got a little confused. On the one hand, the array can be seen as a data type. On the other hand, the array tends to be an unmodifiable lvalue. I imagine that the compiler will do something like replacing the array's identifier with a constant address and an expression for calculating the position given by the index at runtime.
myArray[3] -(compiler)-> AE8349F + 3 * sizeof(<type>)
When saying that an array is a data type, what does this exactly mean? I hope you can help me to clarify my confused understanding of what an array really is and how it is treated by the compiler.
Upvotes: 27
Views: 17551
Reputation: 106112
When speaking about that an array is a data type, what does this exactly mean?
A data type is a set of data with values having predefined characteristics. Examples of data types are: integer, floating point unit number, character, string, and pointer
An array is a group of memory locations related by the fact that they all have the same name and the same type.
If you are wondering why array is not modifiable then best explanation I have ever read is;
C didn't spring fully formed from the mind of Dennis Ritchie; it was derived from an earlier language known as B (which was derived from BCPL).1 B was a "typeless" language; it didn't have different types for integers, floats, text, records, etc. Instead, everything was simply a fixed length word or "cell" (essentially an unsigned integer). Memory was treated as a linear array of cells. When you allocated an array in B, such as
auto V[10];
the compiler allocated 11 cells; 10 contiguous cells for the array itself, plus a cell that was bound to V containing the location of the first cell:
+----+
V: | | -----+
+----+ |
... |
+----+ |
| | <----+
+----+
| |
+----+
| |
+----+
| |
+----+
...
When Ritchie was adding struct
types to C, he realized that this arrangement was causing him some problems. For example, he wanted to create a struct type to represent an entry in a file or directory table:
struct {
int inumber;
char name[14];
};
He wanted the structure to not just describe the entry in an abstract manner, but also to represent the bits in the actual file table entry, which didn't have an extra cell or word to store the location of the first element in the array. So he got rid of it - instead of setting aside a separate location to store the address of the first element, he wrote C such that the address of the first element would be computed when the array expression was evaluated.
This is why you can't do something like
int a[N], b[N];
a = b;
because both a
and b
evaluate to pointer values in that context; it's equivalent to writing 3 = 4
. There's nothing in memory that actually stores the address of the first element in the array; the compiler simply computes it during the translation phase.
For more detail you may like to read this answer.
EDIT: For more clarity; Difference between modifiable l-value, non-modifiable l-value & r-value (in short);
The difference among these kinds of expressions is this:
- A modifiable l-value is addressable (can be the operand of unary &) and assignable (can be the left operand of =).
- A non-modifiable l-value is addressable, but not assignable.
- An r-value is neither addressable nor assignable.
Upvotes: 22
Reputation: 5983
An array is a contiguous block of memory. This means it's laid out in memory sequentially. Let's say we define an array like:
int x[4];
Where sizeof(int) == 32
bits.
This will be laid out in memory like this (picking an arbitrary starting address, let's say 0x00000001
)
0x00000001 - 0x00000004
[element 0]
0x00000005 - 0x00000008
[element 1]
0x00000009 - 0x0000000C
[element 2]
0x0000000D - 0x00000010
[element 3]
You're correct that the compiler replaces the identifier. Remember (if you've learned this. If not, then you're learning something new!) that an array is essentially a pointer. In C/C++, the array name is a pointer to the first element of the array (or a pointer pointing to address 0x00000001
in our example). By doing this:
std::cout << x[2];
You're telling the compiler to add 2 to that memory address, which is pointer arithmetic. Let's say instead you use a variable to index:
int i = 2;
std::cout << x[i];
The compiler sees this:
int i = 2;
std::cout << x + (i * sizeof(int));
It basically multiplies the size of the datatype by the given index and adds that to base address of the array. The compiler basically takes the index-of operator []
and converts it to addition with a pointer.
If you really want to spin your head around this, consider this code:
std::cout << 2[x];
This is completely valid. If you can figure out why, then you've got the concept down.
Upvotes: 1