Reputation: 3503
I'm porting some applications from 32 to 64 bits delphi, which do a lot of text processing, and noticed an extreme change in processing speed. Did some tests with a few procedures, for example, this takes already more than 200% the time in 64bits than compiling to 32 (2000+ ms compared to ~900)
Is this normal?
function IsStrANumber(const S: AnsiString): Boolean;
var P: PAnsiChar;
begin
Result := False;
P := PAnsiChar(S);
while P^ <> #0 do begin
if not (P^ in ['0'..'9']) then Exit;
Inc(P);
end;
Result := True;
end;
procedure TForm11.Button1Click(Sender: TObject);
Const x = '1234567890';
Var a,y,z: Integer;
begin
z := GetTickCount;
for a := 1 to 99999999 do begin
if IsStrANumber(x) then y := 0;//StrToInt(x);
end;
Caption := IntToStr(GetTickCount-z);
end;
Upvotes: 26
Views: 4894
Reputation: 97
Here are two functions. One checks only for positive numbers. The second checks for negative aswell. And is not limited to size. The second one is 4x faster than regular Val
.
function IsInteger1(const S: String): Boolean; overload;
var
E: Integer;
Value: Integer;
begin
Val(S, Value, E);
Result := E = 0;
end;
function IsInteger2(const S: String): Boolean; inline;
var
I: Integer;
begin
Result := False;
I := 0;
while True do
begin
case Ord(S[I+1]) of
0: Break;
$30..$39:
case Ord(S[I+2]) of
0: Break;
$30..$39:
case Ord(S[I+3]) of
0: Break;
$30..$39:
case Ord(S[I+4]) of
0: Break;
$30..$39:
case Ord(S[I+5]) of
0: Break;
$30..$39:
case Ord(S[I+6]) of
0: Break;
$30..$39:
case Ord(S[I+7]) of
0: Break;
$30..$39:
case Ord(S[I+8]) of
0: Break;
$30..$39:
case Ord(S[I+9]) of
0: Break;
$30..$39:
case Ord(S[I+10]) of
0: Break;
$30..$39: Inc(I, 10);
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
end;
Result := True;
end;
function IsInteger3(const S: String): Boolean; inline;
var
I: Integer;
begin
Result := False;
case Ord(S[1]) of
$2D,
$30 .. $39:
begin
I := 1;
while True do
case Ord(S[I + 1]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 2]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 3]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 4]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 5]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 6]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 7]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 8]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 9]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 10]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 11]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 12]) of
0:
Break;
$30 .. $39:
case Ord(S[I + 13]) of
0:
Break;
$30 .. $39:
Inc(I, 13);
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
end;
else
Exit;
end;
Result := True;
end;
Upvotes: 1
Reputation: 34899
The test p^ in ['0'..'9']
is slow in 64-bit.
Added an inlined function with a test for lower/upper boundary instead of the in []
test, plus a test for an empty string.
function IsStrANumber(const S: AnsiString): Boolean; inline;
var
P: PAnsiChar;
begin
Result := False;
P := Pointer(S);
if (P = nil) then
Exit;
while P^ <> #0 do begin
if (P^ < '0') then Exit;
if (P^ > '9') then Exit;
Inc(P);
end;
Result := True;
end;
Benchmark results:
x32 x64
--------------------
hikari 1420 3963
LU RD 1029 1060
In 32 bit, main speed difference is inlining and that P := PAnsiChar(S);
will call an external RTL routine for a nil check before assigning the pointer value, while P := Pointer(S);
just assigns the pointer.
Observing that the goal here is to test if a string is a number and then convert it,
why not use the RTL TryStrToInt()
, which does all in one step and handles signs,blanks as well.
Often when profiling and optimizing routines, the most important thing is to find the right approach to the problem.
Upvotes: 3
Reputation: 28806
There is no current solution for this, as it is caused by the fact that the code for most of the string routines in 64 bit is compiled with PUREPASCAL
defined, IOW, it is plain Delphi, no assembler, while the code for many of the important string routines in 32 bit was done by the FastCode project, and in assembler.
Currently, there are no FastCode equivalents in 64 bit, and I assume that the developer team will try to eliminate assembler anyway, especially since they are moving to more platforms.
This means that optimization of the generated code becomes more and more important. I hope that the announced move to an LLVM backend will speed up much of the code considerably, so pure Delphi code is not such a problem anymore.
So sorry, no solution, but perhaps an explanation.
As of XE4, quite a few FastCode routines have replaced the unoptimized routines I talk about in the above paragraphs. They are usually still PUREPASCAL
, but yet they represent a good optimization. So the situation is not as bad as it used to be. The TStringHelper
and plain string routines still display some bugs and some extremely slow code in OS X (especially where conversion from Unicode to Ansi or vice versa is concerned), but the Win64 part of the RTL seems to be a lot better.
Upvotes: 36
Reputation: 1095
The code can be written like this with good perfomance results:
function IsStrANumber(const S: AnsiString): Boolean; inline;
var
P: PAnsiChar;
begin
Result := False;
P := PAnsiChar(S);
while True do
begin
case PByte(P)^ of
0: Break;
$30..$39: Inc(P);
else
Exit;
end;
end;
Result := True;
end;
Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz
Intel(R) Pentium(R) D CPU 3.40GHz
Unwinding the above loop can result to faster execution:
function IsStrANumber(const S: AnsiString): Boolean; inline;
type
TStrData = packed record
A: Byte;
B: Byte;
C: Byte;
D: Byte;
E: Byte;
F: Byte;
G: Byte;
H: Byte;
end;
PStrData = ^TStrData;
var
P: PStrData;
begin
Result := False;
P := PStrData(PAnsiChar(S));
while True do
begin
case P^.A of
0: Break;
$30..$39:
case P^.B of
0: Break;
$30..$39:
case P^.C of
0: Break;
$30..$39:
case P^.D of
0: Break;
$30..$39:
case P^.E of
0: Break;
$30..$39:
case P^.F of
0: Break;
$30..$39:
case P^.G of
0: Break;
$30..$39:
case P^.H of
0: Break;
$30..$39: Inc(P);
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
else
Exit;
end;
end;
Result := True;
end;
Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz
Intel(R) Pentium(R) D CPU 3.40GHz
If you also apply what Arnaud Bouchez said you can make it even faster.
Upvotes: 6
Reputation: 740
The benefit of 64-bit is in address space, not speed (unless your code is limited by addressable memory).
Historically, this sort of character manipulation code has always been slower on wider machines. It was true moving from the 16-bit 8088/8086 to the 32-bit 386. Putting an 8-bit char in a 64-bit register is a waste of memory bandwidth & cache.
For speed, you can avoid char variables, use pointers, use lookup tables, use bit-parallelism (manipulate 8 chars in one 64-bit word), or use the SSE/SSE2... instructions. Obviously, some of these will make your code CPUID dependent. Also, open the CPU window while debugging, and look for the compiler doing stupid things "for" you like silent string conversions (especially around calls).
You might try looking at some of the native Pascal routines in the FastCode Library. E.G. PosEx_Sha_Pas_2, while not as fast as the assembler versions, is faster than the RTL code (in 32-bits).
Upvotes: 1
Reputation: 43033
Try to avoid any string allocation in your loop.
In your case, the stack preparation of the x64 calling convention could be involved. Did you try to make IsStrANumber
declared as inline
?
I guess this will make it faster.
function IsStrANumber(P: PAnsiChar): Boolean; inline;
begin
Result := False;
if P=nil then exit;
while P^ <> #0 do
if not (P^ in ['0'..'9']) then
Exit else
Inc(P);
Result := True;
end;
procedure TForm11.Button1Click(Sender: TObject);
Const x = '1234567890';
Var a,y,z: Integer;
s: AnsiString;
begin
z := GetTickCount;
s := x;
for a := 1 to 99999999 do begin
if IsStrANumber(pointer(s)) then y := 0;//StrToInt(x);
end;
Caption := IntToStr(GetTickCount-z);
end;
The "pure pascal" version of the RTL is indeed the cause of slowness here...
Note that it is even worse with FPC 64 bit compiler, when compared to the 32 bit version... Sounded that the Delphi compiler is not the only one! 64 bit does not mean "faster", whatever marketing says! It is sometimes even the contrary (e.g. the JRE is known to be slower on 64 bit, and a new x32 model is to be introduced in Linux when it comes about pointer size).
Upvotes: 6