Lukas Hohl
Lukas Hohl

Reputation: 69

Intel 64/ IA32 Packed Horizontal ADD for Quadwords?

I am looking for an instruction like PHADDD just for Quadwords. PHADDQ does not exist, is there some instruction like this?

Upvotes: 2

Views: 215

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 365277

phaddd is no faster than 2 shuffles + a vertical add, so it's only worth considering when you're using 2 separate inputs.

If you were planning to use it with both inputs the same, just use pshufd to copy+swap into another vector. (Or if you just want a scalar horizontal sum, even movhlps can be worth considering to extract the high 64 bits into another register.)


To fully emulate phaddq, you just need two shuffles to take your A B and C D inputs and give you A C and B D vectors you can add to get A+B and C+D elements. That's what punpcklqdq and punpckhqdq do. (unpack quad to dq)

Upvotes: 3

Related Questions