Reputation: 24086
Main question
Does somebody have a fast sin()
implementation for x64?
It doesn't need to be pure pascal.
Explanation
I've got a VCL application that in some situations runs a lot slower when it's compiled for x64.
It does a lot of floating point 3d calculations, and I've tracked this down to the fact that System.Sin()
and System.Cos()
are a lot slower on x64, when the input values become large.
I've timed it by creating a simple test application that measures how long it takes to calculate sin(x)
, with different values for x, and the differences are HUGE:
call: x64: x86:
Sin(1) 16 ms 20 ms
Sin(10) 30 ms 20 ms
Sin(100) 32 ms 20 ms
Sin(1000) 34 ms 21 ms
Sin(10000) 30 ms 21 ms
Sin(100000) 30 ms 16 ms
Sin(1000000) 35 ms 20 ms
Sin(10000000) 581 ms 20 ms
Sin(100000000) 1026 ms 21 ms
Sin(1000000000) 1187 ms 22 ms
Sin(10000000000) 1320 ms 21 ms
Sin(100000000000) 1456 ms 20 ms
Sin(1000000000000) 1581 ms 17 ms
Sin(10000000000000) 1717 ms 22 ms
Sin(100000000000000) 1846 ms 23 ms
Sin(1E15) 1981 ms 21 ms
Sin(1E16) 2100 ms 21 ms
Sin(1E17) 2240 ms 22 ms
Sin(1E18) 2372 ms 18 ms
etc etc etc
What you see here is that sin(1E5)
runs about 300 times as fast as sin(1E8)
.
In case you're interested, I've created the above table like this:
{$APPTYPE CONSOLE}
program SinTest;
uses Diagnostics, Math, SysUtils;
var
i : Integer;
x : double;
sw: TStopwatch;
begin
x := 1;
while X < 1E18 do
begin
sw := TStopwatch.StartNew;
for i := 1 to 500000 do
System.Sin(x);
// WriteLn(System.sin(x), #9,System.Sin(fmod(x,2*pi)));
sw.Stop;
WriteLn(' ', ('Sin(' + round(x).ToString + ')'):20, ' ', sw.ElapsedMilliseconds,' ms');
x := x * 10;
end;
WriteLn('Press any key to continue');
readln;
end.
Notes:
There are some questions on StackOverflow regarding faster sine functions, but none of them have source code that is useful to port to Delphi, like this one: Fastest implementation of sine, cosine and square root in C++ (doesn't need to be much accurate)
The rest of the x64 runs faster than it's 32-bit counterpart
I've found a bit of crappy workaround, by doing this:
Sin(FMod(x,2*pi))
. It provides the correct results, and it runs fast for larger numbers. For smaller numbers it's a bit slower of course.
Upvotes: 4
Views: 2128
Reputation: 31393
While this is probably to be rather strongly discouraged in user mode code (and is totally forbidden in kernel mode code), if you do want to keep the legacy x87 behaviour in your x64 code you could write a function like this :
function SinX87(x:double):double;
var
d : double;
asm
movsd qword ptr [rbp+8], xmm0
fld qword ptr [rbp+8]
fsin
fstp qword ptr [rbp+8]
movsd xmm0, qword ptr [rbp+8]
end;
This adds a bit of overhead since you have to pop the value out of the SSE register onto the stack, load it into the x87 unit, peform the calculation, pop the value back onto the stack, and then load it back into XMM0 for the function result. The sin
calculation is pretty heavy, though, so this is a relatively small overhead. I would only really do this if you needed to preserve whatever idiosyncracies of the x87's sin
implementation.
Other libraries exist that compute sin
more efficiently in x64 code than Delphi's purepascal routines. My overwhelming preference here would be to export a good set of C++ routines to a DLL instead. Also, as David said, using trig functions with ridiculously large arguments is not really a sensible thing to do anyway.
Upvotes: 3
Reputation: 24086
In case you're interested in my final solution:
I've experimented a bit, by doing this (as LU RD and e). – Jerry Coffin suggested):
function sin(x:double):double;
begin
if x<1E6 then
Result := system.sin(x)
else
Result := system.sin(fmod(x,2*pi));
end;
Maybe it has something to do with the predictability of the test code on my specific CPU, but smaller values were actually calculated faster if I didn't do the if
, and just always use fmod(). Strange, because some division needs to take place, which I'd expect to be slowed than comparing two values.
So this is what I end up using now:
function sin(const x: double): double; { inline; }
begin
{$IFDEF CPUX64}
Result := System.sin(Math.FMod(x,2*pi));
{$ELSE}
Result := System.sin(x);
{$ENDIF}
end;
By the way adding inline
, it ran 1.5 times faster even. It then runs exactly as fast as J...'s function on my machine. But even without Inline this is already hundreds of times faster than System.Sin(), so I'm going for this.
Upvotes: 2