warunapww
warunapww

Reputation: 1006

Operands for PADDD instruction

I wrote a simple vector addition program using vector intrinsic operations in C. Here I load 2 vectors and add them, finally store result vector back to global memory.

When I check the assembly code, it has the following sequence of instructions

movdqa  0(%rbp,%rax), %xmm7    
paddd (%r12,%rax), %xmm7
movdqa  %xmm7, (%rbx,%rax)

As you can see, it only moves one operand of the paddd instruction to a register (xmm7). In the paddd instruction 1st operand refers to address in global memory instead of moving it a register first.

Does this mean that when paddd get executed, it does a mov from global memory to register first and then add two operands which are in registers? Which is equivalent to the following code sequence

movdqa  0(%rbp,%rax), %xmm7
movdqa  0(%r12,%rax), %xmm8    
paddd %xmm8, %xmm7
movdqa  %xmm7, (%rbx,%rax)

Let me know if you need more information like compilable program, so that you can generate assembly for yourself.

Upvotes: 1

Views: 1142

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 365577

Most x86 instructions can be used with a memory source operand. No extra register is needed. Read-modify instructions are just as fast as the combination of a load and then the operation. The advantage is that it takes fewer instruction bytes, and doesn't need an extra register.

It can also execute more efficiently in some cases on Intel CPUs (uop micro-fusion). So if you don't need the data at that memory address again soon, prefer folding loads into other instructions.

See http://agner.org/optimize/ for docs on CPU internals, and how to optimize your asm and C code.

Upvotes: 6

Related Questions