C++ vs python numpy complex arrays performance

Question

Can anyone tell me why these two programs have a huge difference in run time? I am simply multiplying two large complex arrays and comparing the time in python (numpy) and c++. I am using the -O3 flag with g++ to compile this C++ code. I find that the huge difference comes only when I use complex floats in C++, its more than 20 times faster in numpy.

python code:

import numpy as np
import time


if __name__ == "__main__":

    # check the data type is the same
    a = np.zeros((1), dtype=np.complex128)
    a[0] = np.complex(3.4e38,3.5e38)
    print(a)
    b = np.zeros((1), dtype=np.complex64)
    b[0] = np.complex(3.4e38,3.5e38)
    print(b)  # imaginary part is infinity

    length = 5000;
    A = np.ones((length), dtype=np.complex64) * np.complex(1,1)
    B = np.ones((length), dtype=np.complex64) * np.complex(1,0)

    num_iterations = 1000000
    time1 = time.time()
    for _ in range(num_iterations):
        A *= B

    time2 = time.time()
    duration = ((time2 - time1)*1e6)/num_iterations
    print(duration)

C++ code:

#include 
#include 
#include 
using namespace std::chrono;
using namespace std;

int main()
{

  // check the data type is the same
  complex a = complex(3.4e38, 3.5e38);
  cout << a << endl;
  complex b = complex(3.4e38, 3.5e38);
  cout << b << endl;  // imaginary part is infinity

  const int length = 5000;
  static complex A[length];
  static complex B[length];

  for(int i=0; i < length; i++) {
    A[i] = complex(1,1);
    B[i] = complex(1,0);
  }

  int num_iterations = 1000000;
  auto time1 = high_resolution_clock::now();
  for(int k=0; k < num_iterations; k++)
    for(int i=0; i < length; i++)
      A[i] *= B[i];

  auto time2 = high_resolution_clock::now();

  auto duration = duration_cast(time2 - time1);
  cout << "average time:" << duration.count() / num_iterations << endl;

}

Mark Borgerding · Accepted Answer

The C++ compiler is doing some extra checking gymnastics for you in order to properly handle NaNs and other such "standard" behavior. If you add the -ffast-math optimization flag, you'll get more sane speed, but less "standard" behavior. e.g. complex(inf,0)*complex(inf,0) won't be evaluated as complex(inf,0). Do you really care?

numpy is doing what makes sense, not hindered by a narrow reading of the C++ standard.

e.g. until very recent g++ versions, the latter of the following functions is much faster unless -ffast-math is used.

complex mul1( complex a,complex b)
{
    return a*b;
}

complex mul2( complex a,complex b)
{
    float * fa = reinterpret_cast(&a);
    const float * fb = reinterpret_cast(&b);
    float cr = fa[0]*fb[0] - fa[1]*fb[1];
    float ci = fa[0]*fb[1] + fa[1]*fb[0];
    return complex(cr,ci);
}

You can experiment with this on https://godbolt.org/z/kXPgCh for the assembly output and how the former function defaults to calling __mulsc3

P.S. Ready for another wave of anger at what the C++ standard says about std::complex? Can you guess how std::norm must be implemented by default? Play along. Follow the link and spend ten seconds thinking about it.

Spoiler: it probably is using a sqrt then squaring it.

C++ vs python numpy complex arrays performance

Answers (1)

Related Questions