Reputation: 582
given this piece of asm code:
(Line 1) .L6:
(Line 2) movsd -8(%rdx,%rax,8), %xmm0
(Line 3) .L2:
(Line 4) addsd (%rcx,%rax,8), %xmm0
(Line 5) movsd %xmm0, (%rdx,%rax,8)
(Line 6) addq $1, %rax
(Line 7) cmpq %rax, %r8
(Line 8) jne .L6
1 ) What is the dependency between xmm0 in Line 2 and Line 4? Is my guess of it being Read after Write correct?
2 ) What about the dependency between line 4 and 5? In line 4, it does seem like xmm0 is being both read and written (in that order). And in line 5 it's being read and then copied into the location (%rdx,%rax,8). So is it Read after Read? Or Read After Write?
I'm confused as there are more than 2 (read,write,read) operations happening so not sure which ones would be considered when looking at data dependencies.
this is the c code:
void randomgenericfunction(double a[], double p[], long n)
{
long i;
p[0] = a[0];
for (i=1; i<n; i++) {
p[i] = p[i-1] + a[i];
}
return;
}
Any help will be appreciated, thanks!
Upvotes: 2
Views: 183
Reputation: 364180
addsd
reads and writes its destination operand, so the read side of that is a RAW relative to the earlier load.
The write side would be a WAW, but the write can't happen until after the read (because addsd can't produce a value until after both its inputs are ready) so you wouldn't normally call that a separate hazard. Any way of handling the RAW dependency in any normal / standard way would already avoid / take care of the WAW hazard.
e.g. register renaming (Tomasulo) would keep track of the fact that addsd
is reading the load result, and later reads of XMM0 are reading the addsd
result, until the next write creates another version of the register. (A lot like SSA Static Single Assignment). Or in any kind of pipeline, there won't be a value to write back until after the load value has been safely read.
I guess you could imagine a case like mulsd %xmm0, %xmm5
between the load and the addsd, and you need it to read the load result, not the addsd
result, even if XMM5 was the result of a cache-miss load and isn't available until long after XMM0, so the addsd
could have executed before an earlier read of XMM0. Obviously an in-order pipeline would have stalled waiting for XMM5 to be ready before execution could reach the addsd
that reads+writes XMM0, and a register-renaming pipeline (like all modern x86) would handle it via renaming.
dependency between line 4 and 5? So is it Read after Read?
That's not hazardous. It's always safe to have multiple concurrent readers of the same register or value. There's no such thing as a RAR hazard; at least one write must be involved for things to get tricky.
Yes, storing the addsd
result is a RAW hazard (true dependency). The store-data uop needs to wait for the result from that FP add.
(Fun fact: on Intel CPUs, the store-address uop runs on a separate execution port, independently of the store data, writing the address into the store-buffer entry that was allocated during issue/alloc/rename when the uops of this instruction entered the out-of-order back-end. The store-address uop only reads %rdx and %rax, so it has a RAW dependency on the addq $1, %rax
in the previous iteration, assuming this is a loop.)
So line 5 has a RAW dependency on line 4, and is not directly dependent on line 2 in any way. (The load was earlier in the dependency chain leading to the store, but separated from it by going through the addsd
.)
If .L6
is somewhere before .L3
so this is a loop, there's a WAR anti-dependency between the store at the bottom and the load in the next iteration, into XMM0.
See also
Upvotes: 2
Reputation: 58052
Line 2 only writes xmm0
, line 4 reads and then writes it, line 5 only reads it. So 2 to 4 and 4 to 5 are both read after write.
I suppose you could argue that 4 to 5 is also read after read, but that's not really a dependency since two reads don't have any effect on each other. If line 4 were changed to only read xmm0, and not write it, then it would be perfectly fine for a compiler or CPU from reordering it with line 5. So that second "dependency" isn't worth mentioning.
Upvotes: 2