Febin Sunny
Febin Sunny

Reputation: 311

How can I deduce whether the address at hand is part of an array, from a trace?

This question is concerning low level programming and the solutions for C,C++ etc. does not apply here

I know that if you go through the ELF file you can see global arrays as chunky symbols. How does the processor know that the address is part of an array ? As all it sees are addresses with no specific metadata. Or is there some way this information is passed down to lower levels ?

Eg:

55: 0000000000201078     0 NOTYPE  GLOBAL DEFAULT   24 _end
56: 0000000000000540    43 FUNC    GLOBAL DEFAULT   14 _start
57: 0000000000201070     0 NOTYPE  GLOBAL DEFAULT   24 __bss_start
58: 000000000000064a    49 FUNC    GLOBAL DEFAULT   14 main
59: 0000000000201020    80 OBJECT  GLOBAL DEFAULT   23 test_Var
60: 0000000000201070     0 OBJECT  GLOBAL HIDDEN    23 __TMC_END__

The above section is a readelf -s output. Since I wrote the code I know line 59 is referring to the global array test_Var. But how will I know some data in an instruction I encounter while simulating this code say on Gem5 or running a PIN tool, is part of this array. Not to mention I cannot even see local arrays at this stage. So the question is if I have an instruction trace, or even a data trace of this program, how can I know this particular array is involved at a given instance ?

Upvotes: 0

Views: 79

Answers (2)

Erik Eidt
Erik Eidt

Reputation: 26656

All the metadata the processor needs is encoded within the machine code instructions that it executes.  And if the processor needs the same information at different times for different parts of the program, the compiler will repeat any necessary metadata in the instructions of all such parts of the program.

The kind of metadata the processor needs is: how big is the item (e.g. byte, half, word, quad), is an item signed or unsigned, how far to skip ahead for each index position of an array, etc..  And generally speaking, the processor requires instruction sequences to get any high level language code done, so some metadata is effectively encoded within individual instructions as well as by the sequences of instructions themselves.

An example here is an array that has a particular data type, and of course, is used (e.g. indexed) read & written in different parts of the program.  The C program encodes type information (metadata) within the C array declaration, and this type holds no matter what function is accessing the array.  However, the processor does not read data declarations, only machine code instructions!   So, the translation repeats the size and access pattern information with machine code instructions and instruction sequences as needed, and thus, the compiler ensures consistent access by the processor.

Upvotes: 2

Peter Cordes
Peter Cordes

Reputation: 365277

It doesn't; this is the point of undefined behaviour in C / C++: compiler assumes the programmer is right and just does what the source code says. If that leads to accessing a different object that was nearby in memory, that's the programmer's fault, not the compiler's or the CPUs.

We gain efficiency by not even trying to detect this at runtime.

Upvotes: 1

Related Questions