user3475849
user3475849

Reputation: 1

replicating x64 MOVQ in x86 assembly

How could i go about replicating a x64 MOVQ (move quad word) instruction in x86 assembly?

For example. Given:

movq xmm5, [esi+2h]
movq [edi+f1h], xmm5

Would this work? :

 push eax
 push edx
 mov eax, [esi+2h]
 mov edx, [esi+6h] ; +4 byte offset
 mov [edi+f1h], eax
 mov [edi+f5h], edx  ; +4 byte offset
 pop edx
 pop eax

Upvotes: 0

Views: 902

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 365247

SSE2 movq xmm, xmm/m64 works in 32-bit code (on CPUs that support it). The code you showed already used 32-bit addressing modes, so it will work unchanged in 32-bit mode. There's another form of movq that only works in 64-bit mode, that's the movq xmm, r64/m64. The memory-source form of the same opcode that lets you do movq xmm0, rax.

But anyway, 32-bit SSE2:

movq    xmm5, [esi+2h]
movq    [edi+f1h], xmm5

If you can only assume SSE1 but not SSE2, you can use movlps

;; xorps  xmm5,xmm5     ; optional to break a dependency on old value
movlps   xmm5, [esi+2h]       ; merges into xmm5: false dependency
movlps   [edi+f1h], xmm5

Depending on what you're doing, it could possibly be worth it to use MMX if you have it but not SSE1:

movq    mm0, [esi+2h]
movq    [edi+f1h], mm0

; emms required later, after a loop.

If you really want a single-instruction 64-bit load/store so it's atomic (on P5 and later) for aligned addresses, then fild/fistp is a good choice. (gcc uses this for std::atomic<int64_t> with -m32 -mno-sse.)

It will never munge your data unless you (or MSVC++'s CRT) have the x87 precision bits set to less than a 64-bit mantissa.

fild    qword ptr [esi+2h]
fistp   qword ptr [edi+f1h]

fild / fistp might even have better throughput for copying scattered 64-bit chunks than using 32-bit integer load/store, at least on modern CPUs. For contiguous copies of maybe 32 or 64 bytes or larger, use rep movsd. (Usually the threshold for rep movsd being worth it is much higher, but we're talking about without SIMD vectors and with only 32-bit integer or 64-bit fild/fistp multi-uop load/store instructions.)


With plain integer, just pick a register you can clobber. (Or in MSVC inline asm, let the compiler worry about saving it.) If registers are tight, only use one (if your src and dst are known not to overlap):

 mov   eax, [esi+2h]
 mov   [edi+f1h], eax
 mov   eax, [esi+2h + 4]     ; write the +4 separately in the addressing mode as documentation
 mov   [edi+f1h + 4], eax

If you can spare 2 registers, then yes it's probably better to do both loads and then both stores.

Upvotes: 2

apangin
apangin

Reputation: 98540

Try

fild  qword ptr [esi+2h]
fistp qword ptr [edi+f1h]

Upvotes: 1

Related Questions