Reputation: 69
I am looking for an instruction like PHADDD just for Quadwords. PHADDQ does not exist, is there some instruction like this?
Upvotes: 2
Views: 215
Reputation: 365277
phaddd
is no faster than 2 shuffles + a vertical add, so it's only worth considering when you're using 2 separate inputs.
If you were planning to use it with both inputs the same, just use pshufd
to copy+swap into another vector. (Or if you just want a scalar horizontal sum, even movhlps
can be worth considering to extract the high 64 bits into another register.)
To fully emulate phaddq
, you just need two shuffles to take your A B
and C D
inputs and give you A C
and B D
vectors you can add to get A+B and C+D elements. That's what punpcklqdq
and punpckhqdq
do. (unpack quad to dq)
Upvotes: 3