Megan Darcy
Megan Darcy

Reputation: 582

what is the dependency between these variables?

given this piece of asm code:

(Line 1) .L6:
(Line 2) movsd -8(%rdx,%rax,8), %xmm0 
(Line 3) .L2:
(Line 4) addsd (%rcx,%rax,8), %xmm0
(Line 5) movsd %xmm0, (%rdx,%rax,8)
(Line 6) addq $1, %rax
(Line 7) cmpq %rax, %r8
(Line 8) jne .L6

1 ) What is the dependency between xmm0 in Line 2 and Line 4? Is my guess of it being Read after Write correct?

2 ) What about the dependency between line 4 and 5? In line 4, it does seem like xmm0 is being both read and written (in that order). And in line 5 it's being read and then copied into the location (%rdx,%rax,8). So is it Read after Read? Or Read After Write?

I'm confused as there are more than 2 (read,write,read) operations happening so not sure which ones would be considered when looking at data dependencies.

this is the c code:

void randomgenericfunction(double a[], double p[], long n)
{
     long i;
     p[0] = a[0];
     for (i=1; i<n; i++) {
         p[i] = p[i-1] + a[i];
     }
     return;
}

Any help will be appreciated, thanks!

Upvotes: 2

Views: 183

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 364180

addsd reads and writes its destination operand, so the read side of that is a RAW relative to the earlier load.

The write side would be a WAW, but the write can't happen until after the read (because addsd can't produce a value until after both its inputs are ready) so you wouldn't normally call that a separate hazard. Any way of handling the RAW dependency in any normal / standard way would already avoid / take care of the WAW hazard.

e.g. register renaming (Tomasulo) would keep track of the fact that addsd is reading the load result, and later reads of XMM0 are reading the addsd result, until the next write creates another version of the register. (A lot like SSA Static Single Assignment). Or in any kind of pipeline, there won't be a value to write back until after the load value has been safely read.

I guess you could imagine a case like mulsd %xmm0, %xmm5 between the load and the addsd, and you need it to read the load result, not the addsd result, even if XMM5 was the result of a cache-miss load and isn't available until long after XMM0, so the addsd could have executed before an earlier read of XMM0. Obviously an in-order pipeline would have stalled waiting for XMM5 to be ready before execution could reach the addsd that reads+writes XMM0, and a register-renaming pipeline (like all modern x86) would handle it via renaming.


dependency between line 4 and 5? So is it Read after Read?

That's not hazardous. It's always safe to have multiple concurrent readers of the same register or value. There's no such thing as a RAR hazard; at least one write must be involved for things to get tricky.

Yes, storing the addsd result is a RAW hazard (true dependency). The store-data uop needs to wait for the result from that FP add.

(Fun fact: on Intel CPUs, the store-address uop runs on a separate execution port, independently of the store data, writing the address into the store-buffer entry that was allocated during issue/alloc/rename when the uops of this instruction entered the out-of-order back-end. The store-address uop only reads %rdx and %rax, so it has a RAW dependency on the addq $1, %rax in the previous iteration, assuming this is a loop.)

So line 5 has a RAW dependency on line 4, and is not directly dependent on line 2 in any way. (The load was earlier in the dependency chain leading to the store, but separated from it by going through the addsd.)


If .L6 is somewhere before .L3 so this is a loop, there's a WAR anti-dependency between the store at the bottom and the load in the next iteration, into XMM0.

See also

Upvotes: 2

Nate Eldredge
Nate Eldredge

Reputation: 58052

Line 2 only writes xmm0, line 4 reads and then writes it, line 5 only reads it. So 2 to 4 and 4 to 5 are both read after write.

I suppose you could argue that 4 to 5 is also read after read, but that's not really a dependency since two reads don't have any effect on each other. If line 4 were changed to only read xmm0, and not write it, then it would be perfectly fine for a compiler or CPU from reordering it with line 5. So that second "dependency" isn't worth mentioning.

Upvotes: 2

Related Questions