Wouter van Nifterick
Wouter van Nifterick

Reputation: 24086

Faster sin() for x64

Main question

Does somebody have a fast sin() implementation for x64? It doesn't need to be pure pascal.

Explanation

I've got a VCL application that in some situations runs a lot slower when it's compiled for x64.

It does a lot of floating point 3d calculations, and I've tracked this down to the fact that System.Sin() and System.Cos() are a lot slower on x64, when the input values become large.

I've timed it by creating a simple test application that measures how long it takes to calculate sin(x), with different values for x, and the differences are HUGE:

              call:     x64:     x86:
              Sin(1)   16 ms    20 ms
             Sin(10)   30 ms    20 ms
            Sin(100)   32 ms    20 ms
           Sin(1000)   34 ms    21 ms
          Sin(10000)   30 ms    21 ms
         Sin(100000)   30 ms    16 ms
        Sin(1000000)   35 ms    20 ms
       Sin(10000000)  581 ms    20 ms
      Sin(100000000) 1026 ms    21 ms
     Sin(1000000000) 1187 ms    22 ms
    Sin(10000000000) 1320 ms    21 ms
   Sin(100000000000) 1456 ms    20 ms
  Sin(1000000000000) 1581 ms    17 ms
 Sin(10000000000000) 1717 ms    22 ms
Sin(100000000000000) 1846 ms    23 ms
           Sin(1E15) 1981 ms    21 ms
           Sin(1E16) 2100 ms    21 ms
           Sin(1E17) 2240 ms    22 ms
           Sin(1E18) 2372 ms    18 ms
                etc    etc      etc

What you see here is that sin(1E5) runs about 300 times as fast as sin(1E8).

In case you're interested, I've created the above table like this:

{$APPTYPE CONSOLE}
program SinTest;

uses Diagnostics, Math, SysUtils;

var
  i : Integer;
  x : double;
  sw: TStopwatch;

begin
  x := 1;

  while X < 1E18 do
  begin
    sw    := TStopwatch.StartNew;
    for i := 1 to 500000 do
      System.Sin(x);

    // WriteLn(System.sin(x), #9,System.Sin(fmod(x,2*pi)));

    sw.Stop;

    WriteLn('    ', ('Sin(' + round(x).ToString + ')'):20, ' ', sw.ElapsedMilliseconds,' ms');

    x := x * 10;
  end;

  WriteLn('Press any key to continue');
  readln;
end.

Notes:

Upvotes: 4

Views: 2128

Answers (2)

J...
J...

Reputation: 31393

While this is probably to be rather strongly discouraged in user mode code (and is totally forbidden in kernel mode code), if you do want to keep the legacy x87 behaviour in your x64 code you could write a function like this :

function SinX87(x:double):double;
var
  d : double;
asm
  movsd qword ptr [rbp+8], xmm0
  fld qword ptr [rbp+8]
  fsin
  fstp qword ptr [rbp+8]
  movsd xmm0, qword ptr [rbp+8]
end;

This adds a bit of overhead since you have to pop the value out of the SSE register onto the stack, load it into the x87 unit, peform the calculation, pop the value back onto the stack, and then load it back into XMM0 for the function result. The sin calculation is pretty heavy, though, so this is a relatively small overhead. I would only really do this if you needed to preserve whatever idiosyncracies of the x87's sin implementation.

Other libraries exist that compute sin more efficiently in x64 code than Delphi's purepascal routines. My overwhelming preference here would be to export a good set of C++ routines to a DLL instead. Also, as David said, using trig functions with ridiculously large arguments is not really a sensible thing to do anyway.

Upvotes: 3

Wouter van Nifterick
Wouter van Nifterick

Reputation: 24086

In case you're interested in my final solution:

I've experimented a bit, by doing this (as LU RD and e). – Jerry Coffin suggested):

function sin(x:double):double;
begin
  if x<1E6 then
    Result := system.sin(x)
  else
    Result := system.sin(fmod(x,2*pi));
end;

Maybe it has something to do with the predictability of the test code on my specific CPU, but smaller values were actually calculated faster if I didn't do the if, and just always use fmod(). Strange, because some division needs to take place, which I'd expect to be slowed than comparing two values.

So this is what I end up using now:

function sin(const x: double): double; { inline; }
begin
  {$IFDEF CPUX64}
  Result := System.sin(Math.FMod(x,2*pi));
  {$ELSE}
  Result := System.sin(x);
  {$ENDIF}
end;

By the way adding inline, it ran 1.5 times faster even. It then runs exactly as fast as J...'s function on my machine. But even without Inline this is already hundreds of times faster than System.Sin(), so I'm going for this.

Upvotes: 2

Related Questions