Reputation: 31
This is a snippet of an assignment regarding data hazards, but I am struggling to understand what it is doing. words beind :
are my understanding of the instructions
loop:
1. π΄π·π·πΌ π
2, π
2, 1 : Add 1 to R2
2. πΏπ· π
4, 0(π
3) : Load data at R3 address into R4 (?)
3. πΏπ· π
5, 4(π
3) : Load data at R3 address into R5 (?)
4. SLT π
6, π
4, π
5 : Set R6 = R4 < R5 ? 0 : 1
5. SD π
6, 8(R3) : Store data in R6 at R3 address (?)
6. ADDI π
3, π
3, 1 : Add 1 to R3 (?)
7. π΅ππΈπ π
2, ππππ. : If R2 == 0, goto 1, else proceed to 8
8. ADD R11, R12, R13.: ???
Notes
R2 = -2
Questions
0(R3)
, 4(R3)
, 8(R3)
?R2: -2 -> -1 -> 0
)Thanks all!
Found this community-effort cheatsheet, and IBM docs
Upvotes: 1
Views: 841
Reputation: 1688
I've edited the list with what each instruction does:
1. π΄π·π·πΌ π
2, π
2, 1 : Add 1 to R2 and store the result in R2
2. πΏπ· π
4, 0(π
3) : Load data at R3 address into R4.
3. πΏπ· π
5, 4(π
3) : Load data at (R3 address + 4) into R5
4. SLT π
6, π
4, π
5 : R4 < R5 ? R6 = 1 : Do Nothing
5. SD π
6, 8(R3) : Store data in R6 at (R3 address + 8)
6. ADDI π
3, π
3, 1 : Add 1 to R3 (?)
7. π΅ππΈπ π
2, ππππ. : If R2 == 0, goto 1, else proceed to 8
8. ADD R11, R12, R13.: Add R13 to R12 and store the result in R11.
(R3)
is a temporary offset that is added to R3
before it is dereferenced. In other words, LD R5,4(R3)
has the same effect on registers as:ADDI R3,R3,4 ;add 4 to the value in R3, and store the result in R3
LD R5,(R3) ;treating the value in R3 as a memory address,
;dereference it and store the integer at that address into R5
SUBI R3,R3,4 ;return R3 to its original state.
Except this all happens in one instruction and no modification to R3
actually takes place.
R3
isn't really important in the same way as R2
is. R2
is being used as a loop counter whereas R3
is being used as a pointer to memory (what it's pointing to, I have no idea). ADDI R3,R3, 1 : Add 1 to R3 and store the result in R3.
Presumably, R3
is intended to point to a 32-bit integer. This is implied by all the offsets being in multiples of four. For illustration purposes, let's pretend that at the start of the loop, R3 = 0x40000000
. All the bytes stored at these addresses are made up by me, with the exception of bytes stored at 0x40000008-0x4000000B
, which were written to memory by the instruction SD R6,8(R3)
. (I'm assuming a big-endian architecture hence the byte order.)
0x40000000: 0xDE
0x40000001: 0xAD
0x40000002: 0xBE
0x40000003: 0xEF
0x40000004: 0x12
0x40000005: 0x34
0x40000006: 0x56
0x40000007: 0x78
0x40000008: 0x00
0x40000009: 0x00
0x4000000A: 0x00
0x4000000B: 0x01
After instruction 6 in your list executes, R4
contains 0xDEADBEEF
and R5
contains 0x12345678
. That's fine, but the problem is we added 1 to R3
instead of 4. This means that the numbers we're loading into R4
and R5
on the subsequent passes through the loop weren't the intended data, but rather junk that was Frankensteined together from different values. Here's what we have after the second pass:
0x40000000: 0xDE
0x40000001: 0xAD
0x40000002: 0xBE
0x40000003: 0xEF
0x40000004: 0x12
0x40000005: 0x34
0x40000006: 0x56
0x40000007: 0x78
0x40000008: 0x00
0x40000009: 0x00
0x4000000A: 0x00
0x4000000B: 0x00
0x4000000C: 0x01
Here, R4 = 0xADBEEF12
and R5 = 0x34567800
. In order to correctly iterate through memory, we need to change ADDI R3,R3, 1
to ADDI R3,R3, 4
.
Now why would the CPU even let you do this? Well, some CPUs actually don't, and will fault if you try to write to an unaligned address. Others, like x86, aren't so picky. As it turns out, the CPU has no idea what type your data is, and relies on the programmer or compiler to enforce type rules.
Upvotes: 1