Reputation: 14622
I have the following function and need to make it compatible with 64-bit platform:
procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
asm
mov ebx, Tab
mov ecx, Len
mov edx, Buf
@1: mov al, [edx]
xlat
mov [edx], al
inc edx
dec ecx
jnz @1
end;
Delphi XE5 raises error [dcc64 Error] E2107 Operand size mismatch
on the lines with Tab
and Len
parameters.
Unfortunately I don't know assembler enough to fix the issue myself. What should I change to successfully compile the function?
Upvotes: 1
Views: 3660
Reputation: 10937
Why you are using assembler?
There is no good reason!
This is direct translarion of your asm code to Delphi pascal:
procedure ExecuteAsm(Tab, Buf: PByte; Len: DWORD);
repeat
Buf^ := Tab[Buf^];
inc(Buf);
dec(Len);
until Len = 0;
end;
But as you can see now, if value Len
is 0 then procedure should corupt program memoray.
...
This code looks better, because while
loop test the 0 value and never execute the loop.
procedure ExecuteAsm(Tab, Buf: PByte; Len: cardinal);
begin
while Len > 0 do
begin
Buf^ := Tab[Buf^];
inc(Buf);
dec(Len);
end;
end;
However, if you still like assembler you must preserve ebx/rbx register like...
procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
asm
push ebx //rbx
//... your code
pop ebx //rbx
end;
EDIT: Added 32 bit and 64 bit tests
Because HeartWare didn't do homework by David Heffernan, I did. Original test made David Heffernan, look HeartWares comments. I have made just a little changes and added two more test cases. This directive is inportant: {$O+} //Turn on compiler optimisation... :)
{$APPTYPE CONSOLE}
uses
Diagnostics;
{$O+} //Turn on compiler optimisation... :)
procedure _asm_GJ(Tab, Buf : PByte; Len : Cardinal);
// 32-bit eax edx ecx
// 64-bit rcx rdx r8
asm
{$IFDEF CPUX64 }
test Len, Len
jz @exit
@loop:
movzx rax, [Buf]
mov al, byte ptr[Tab + rax]
mov [Buf],al
inc Buf
dec Len
jnz @loop
{$ELSE }
test Len, Len
jz @exit
push ebx
@loop:
movzx ebx, [Buf]
mov bl,byte ptr[Tab + ebx]
mov [Buf], bl
inc Buf
dec Len
jnz @loop
pop ebx
{$ENDIF }
@exit:
end;
procedure _asm_HeartWare(Tab, Buf : PByte; Len : Cardinal);
// 32-bit EAX EDX ECX
// 64-bit RCX RDX R8
asm
{$IFDEF CPUX64 }
XCHG R8,RCX
JECXZ @OUT
XOR RAX,RAX
@LOOP:
MOV AL,[RDX]
MOV AL,[R8+RAX]
MOV [RDX],AL
INC RDX
DEC ECX
JNZ @LOOP
// LOOP @LOOP
{$ELSE }
JECXZ @OUT
PUSH EBX
XCHG EAX,EBX
XOR EAX,EAX
@LOOP:
MOV AL,[EDX+ECX-1]
MOV AL,[EBX+EAX]
MOV [EDX+ECX-1],AL
DEC ECX
JNZ @LOOP
// LOOP @LOOP
POP EBX
{$ENDIF }
@OUT:
end;
procedure _pas_normal(Tab, Buf: PByte; Len: Cardinal);
begin
while Len > 0 do begin
Buf^ := Tab[Buf^];
inc(Buf);
dec(Len);
end;
end;
procedure _pas_inline(Tab, Buf: PByte; Len: Cardinal); inline;
begin
while Len > 0 do begin
Buf^ := Tab[Buf^];
inc(Buf);
dec(Len);
end;
end;
var
Stopwatch: TStopwatch;
i: Integer;
x, y: array [0 .. 1023] of Byte;
procedure refresh;
begin
for i := low(x) to high(x) do
begin
x[i] := i mod 256;
y[i] := (i + 20) mod 256;
end;
end;
begin
{$IFDEF CPUX64 }
Writeln('64 bit mode');
{$ELSE }
Writeln('32 bit mode');
{$ENDIF }
refresh;
Stopwatch := TStopwatch.StartNew;
for i := 1 to 1000000 do
begin
_asm_HeartWare(PByte(@x), PByte(@y), SizeOf(x));
end;
Writeln('asm HeartWare : ', Stopwatch.ElapsedMilliseconds, 'ms');
refresh;
Stopwatch := TStopwatch.StartNew;
for i := 1 to 1000000 do
begin
_asm_GJ(PByte(@x), PByte(@y), SizeOf(x));
end;
Writeln('asm GJ : ', Stopwatch.ElapsedMilliseconds, 'ms');
refresh;
Stopwatch := TStopwatch.StartNew;
for i := 1 to 1000000 do
begin
_pas_normal(PByte(@x), PByte(@y), SizeOf(x));
end;
Writeln('pas normal : ', Stopwatch.ElapsedMilliseconds, 'ms');
refresh;
Stopwatch := TStopwatch.StartNew;
for i := 1 to 1000000 do
begin
_pas_inline(PByte(@x), PByte(@y), SizeOf(x));
end;
Writeln('pas inline : ', Stopwatch.ElapsedMilliseconds, 'ms');
Readln;
end.
And results...
Cunclusion...
There is almost nothing to say! Numbers talk...
Delphi compiler is good, hmm very good!
I have built in test another asm optimisated procedure, because HeartWare asm optimisation isn't real optimisation.
Upvotes: 5
Reputation: 8243
NOTE: Read the accepted answer by GJ as it contains a Pascal implementation that beats the crap out of my version (I seem to confuse the compiler by using ABSOLUTE to overcome the signature problem GJ's implementation has, which is one of the reasons why I didn't use it as the Pascal version, but even when recoded to match the signature and using explicit type casts within the routine, it was still much faster than my Pascal version, and on par with the optimized assembler version, so as stated in my own reply and all the others, use a Pascal implementation when possible, unless it is a time-critical routine called a gazillion times, and an actual benchmark shows that the ASM version is significantly faster - which (in my defense) my benchmark did show.
{$IFDEF MSWINDOWS }
PROCEDURE ExecuteAsm(Tab,Buf : POINTER ; Len : DWORD); ASSEMBLER; Register;
// 32-bit EAX EDX ECX
// 64-bit RCX RDX R8
ASM
{$IFDEF CPUX64 }
XCHG R8,RCX
JECXZ @OUT
XOR RAX,RAX
@LOOP:
MOV AL,[RDX]
MOV AL,[R8+RAX]
MOV [RDX],AL
INC RDX
DEC ECX
JNZ @LOOP
// LOOP @LOOP
{$ELSE }
JECXZ @OUT
PUSH EBX
XCHG EAX,EBX
XOR EAX,EAX
@LOOP:
MOV AL,[EDX+ECX-1]
MOV AL,[EBX+EAX]
MOV [EDX+ECX-1],AL
DEC ECX
JNZ @LOOP
// LOOP @LOOP
POP EBX
{$ENDIF }
@OUT:
END;
{$ELSE }
PROCEDURE ExecuteAsm(Tab,Buf : POINTER ; Len : DWORD);
VAR
TabP : PByte ABSOLUTE Tab;
BufP : PByte ABSOLUTE Buf;
I : Cardinal;
BEGIN
FOR I:=1 TO Len DO BEGIN
BufP^:=TabP[BufP^];
INC(BufP)
END
END;
{$ENDIF }
This should be a valid substitution for all currently supported compilers and platforms. While I agree that it might be better to use the pure Pascal version, it does lead to some horrendous assembly code with lots of unnecessary reloading of registers (at least in 32-bit), so the pure assembly version is definitely faster.
However, unless you run it like a gazillion times, you probably won't notice it in actual use, and the pure Pascal routine will most likely perform adequately. However, only you can determine if the speed improvement is necessary.
Anyway, here are the timings for executing the PROCEDURE 100.000 times on a 256 byte array (using XE5):
32-bit ASM: 47 ms
64-bit ASM: 47 ms
32-bit PAS: 63 ms
64-bit PAS: 78 ms
and the timings for running it 10.000.000 times in RELEASE configuration:
32-bit ASM: 5281 ms
64-bit ASM: 5281 ms
32-bit PAS: 7765 ms
64-bit PAS: 10031 ms
Still, however, the ASM version beats out the Pascal version in all cases...
And the hand-optimized assembly version performed even better:
32-bit ASM: 1906 ms
64-bit ASM: 1859 ms
32-bit PAS: 7781 ms
64-bit PAS: 10015 ms
And with 10.000 times 25.600 bytes instead:
32-bit ASM: 218 ms
64-bit ASM: 172 ms
32-bit PAS: 734 ms
64-bit PAS: 937 ms
In ALL cases, my ASM routine beats the crap out of the compiler's. I simply can't reproduce your timings... What code and compiler did you use?
The actual code that computes the time is as follows (for the 10.000 times 25.600 bytes):
T:=GetTickCount;
FOR I:=1 TO 10000 DO ExecuteAsm(TAB,BUF,25600);
T:=GetTickCount-T;
Upvotes: 3
Reputation: 596156
That assembly code is essentially just doing the following, which would work in both 32bit and 64bit:
procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
var
pBuf: PByte;
begin
pBuf := PByte(Buf);
repeat
pBuf^ := PByte(Tab)[pBuf^];
Inc(pBuf);
Dec(Len);
until Len = 0;
end;
So why not just use plain Delphi code and let the compiler deal with the assembly?
Upvotes: 5
Reputation: 14622
Absolutely not sure that it will work correctly but it compiles successfully:
procedure ExecuteAsm(Tab, Buf: Pointer; Len: DWORD);
asm
mov rbx, Tab
mov ecx, Len
mov rdx, Buf
@1: mov al, [rdx]
xlat
mov [rdx], al
inc rdx
dec ecx
jnz @1
end;
Is it the correct answer?
Upvotes: 0