Reputation: 311
This question is concerning low level programming and the solutions for C,C++ etc. does not apply here
I know that if you go through the ELF file you can see global arrays as chunky symbols. How does the processor know that the address is part of an array ? As all it sees are addresses with no specific metadata. Or is there some way this information is passed down to lower levels ?
Eg:
55: 0000000000201078 0 NOTYPE GLOBAL DEFAULT 24 _end
56: 0000000000000540 43 FUNC GLOBAL DEFAULT 14 _start
57: 0000000000201070 0 NOTYPE GLOBAL DEFAULT 24 __bss_start
58: 000000000000064a 49 FUNC GLOBAL DEFAULT 14 main
59: 0000000000201020 80 OBJECT GLOBAL DEFAULT 23 test_Var
60: 0000000000201070 0 OBJECT GLOBAL HIDDEN 23 __TMC_END__
The above section is a readelf -s output. Since I wrote the code I know line 59 is referring to the global array test_Var. But how will I know some data in an instruction I encounter while simulating this code say on Gem5 or running a PIN tool, is part of this array. Not to mention I cannot even see local arrays at this stage. So the question is if I have an instruction trace, or even a data trace of this program, how can I know this particular array is involved at a given instance ?
Upvotes: 0
Views: 79
Reputation: 26656
All the metadata the processor needs is encoded within the machine code instructions that it executes. And if the processor needs the same information at different times for different parts of the program, the compiler will repeat any necessary metadata in the instructions of all such parts of the program.
The kind of metadata the processor needs is: how big is the item (e.g. byte, half, word, quad), is an item signed or unsigned, how far to skip ahead for each index position of an array, etc.. And generally speaking, the processor requires instruction sequences to get any high level language code done, so some metadata is effectively encoded within individual instructions as well as by the sequences of instructions themselves.
An example here is an array that has a particular data type, and of course, is used (e.g. indexed) read & written in different parts of the program. The C program encodes type information (metadata) within the C array declaration, and this type holds no matter what function is accessing the array. However, the processor does not read data declarations, only machine code instructions! So, the translation repeats the size and access pattern information with machine code instructions and instruction sequences as needed, and thus, the compiler ensures consistent access by the processor.
Upvotes: 2
Reputation: 365277
It doesn't; this is the point of undefined behaviour in C / C++: compiler assumes the programmer is right and just does what the source code says. If that leads to accessing a different object that was nearby in memory, that's the programmer's fault, not the compiler's or the CPUs.
We gain efficiency by not even trying to detect this at runtime.
Upvotes: 1