UXkQEZ7
UXkQEZ7

Reputation: 1184

numpy's QR appears faster than scipy's - why?

For some reason, numpy's QR decomposition is always faster than scipy's which seems strange because scipy is supposed to include all of numpy plus more advanced features. Any idea why?

import numpy.linalg as nla
import scipy.linalg as la
A = np.random.randn(4000,4000)

%timeit -n 3 -r 3 Q,R = nla.qr(A)
%timeit -n 3 -r 3 Q,R = la.qr(A)
%timeit -n 3 -r 3 Q,R = nla.qr(A)
%timeit -n 3 -r 3 Q,R = la.qr(A)

3 loops, best of 3: 1.17 s per loop

3 loops, best of 3: 1.21 s per loop

3 loops, best of 3: 1.05 s per loop

3 loops, best of 3: 1.21 s per loop

The difference is more noticeable with mode="raw".

%timeit -n 3 -r 3 Q = nla.qr(A, mode='raw')
%timeit -n 3 -r 3 Q = la.qr(A, mode='raw')
%timeit -n 3 -r 3 Q = nla.qr(A, mode='raw')
%timeit -n 3 -r 3 Q = la.qr(A, mode='raw')

3 loops, best of 3: 440 ms per loop

3 loops, best of 3: 612 ms per loop

3 loops, best of 3: 436 ms per loop

3 loops, best of 3: 758 ms per loop

I'm using Intel MKL for both, and I have the latest dev versions

numpy version: 1.11.0.dev0+90a1a9f

scipy version: 0.17.0.dev0+8003fab

NumPy config:
blas_opt_info:
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
mkl_info:
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
lapack_opt_info:
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
blas_mkl_info:
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']

SciPy config:
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
mkl_info:
    libraries = ['mkl_rt', 'pthread']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']

Upvotes: 4

Views: 1084

Answers (1)

smarie
smarie

Reputation: 5156

Very interesting post ! Facing a similar question, I came up with the following experiment that does not rely on timeit but explores various matrix shapes.

It does not seem to confirm this assumption on this particular environment but might be of interest as a starting point for other experiments ?

"""
Compares Scipy and Numpy QR execution times for various n x m random matrices
"""
import time

from scipy import linalg
import numpy as np
import matplotlib.pyplot as plt

from timeit import timeit

n_feature_grid = [100, 500, 1000, 2000, 4000, 7500, 10000]
max_timeit_nb = 100
m_ratio_grid = [0.05, 0.1, 0.3, 1.0]

exec_times_scipy = np.empty((len(n_feature_grid), len(m_ratio_grid), max_timeit_nb)) * np.nan
exec_times_numpy = np.empty((len(n_feature_grid), len(m_ratio_grid), max_timeit_nb)) * np.nan
rng = np.random.RandomState(69)

for i, n in enumerate(n_feature_grid):
    for j, m_ratio in enumerate(m_ratio_grid):
        m = int(np.ceil(m_ratio * n))
        if m * n < 100 * 100:
            timeit_nb = max_timeit_nb
        elif m * n < 1000 * 100:
            timeit_nb = 50
        elif m * n < 3000 * 10000:
            timeit_nb = 10
        else:
            timeit_nb = 5

        print("Experiment for matrix %i x %i, repeated %i times" % (n, m, timeit_nb))

        for k, nb in enumerate(range(timeit_nb)):
            X = rng.randn(n, m)

            # numpy
            start_time = time.perf_counter()
            Qn, Rn = np.linalg.qr(X, mode='reduced')
            exec_times_numpy[i, j, k] = time.perf_counter() - start_time

            # scipy
            start_time = time.perf_counter()
            Qs, Rs = linalg.qr(X, mode='economic')
            exec_times_scipy[i, j, k] = time.perf_counter() - start_time

            np.testing.assert_equal(Qs, Qn)
            np.testing.assert_equal(Rs, Rn)


avg_exec_times_scipy = exec_times_scipy.mean(axis=2)
std_exec_times_scipy = exec_times_scipy.std(axis=2)
avg_exec_times_numpy = exec_times_numpy.mean(axis=2)
std_exec_times_numpy = exec_times_numpy.std(axis=2)

avg_exec_times_ratio = np.nanmean((exec_times_scipy / exec_times_numpy), axis=2)
std_exec_times_ratio = np.nanstd((exec_times_scipy / exec_times_numpy), axis=2)
# debug
# np.count_nonzero(np.isnan((exec_times_scipy / exec_times_numpy)), axis=2)

# Plot results
fig, ax = plt.subplots(1, figsize=(16, 12))
for k, m_ratio in enumerate(m_ratio_grid):
    ax.errorbar(n_feature_grid, avg_exec_times_ratio[:, k], yerr=std_exec_times_ratio[:, k],
                marker='x', linestyle='-', # color='r',
                label='scipy/numpy(m=%.2f*n)' % m_ratio)

ax.legend(loc='upper left')
ax.set_ylabel("Ratio scipy/numpy QR exec time.")
ax.set_xlabel("n_features")
ax.set_title("scipy/numpy execution time ratio for QR on a n x m random matrix")

ax.plot([n_feature_grid[0], n_feature_grid[-1]], [1., 1.], linestyle='--', color='k')

plt.show()

Generates the following graph, where the ratio is clearly not often above 1 for small matrices but tends (not so obvious though) to be more than 1 for large non-squared matrices:

enter image description here

The matrix sizes and nb repetitions appear in the logs:

Experiment for matrix 100 x 5, repeated 100 times
Experiment for matrix 100 x 10, repeated 100 times
Experiment for matrix 100 x 30, repeated 100 times
Experiment for matrix 100 x 100, repeated 50 times
Experiment for matrix 500 x 25, repeated 50 times
Experiment for matrix 500 x 50, repeated 50 times
Experiment for matrix 500 x 150, repeated 50 times
Experiment for matrix 500 x 500, repeated 10 times
Experiment for matrix 1000 x 50, repeated 50 times
Experiment for matrix 1000 x 100, repeated 10 times
Experiment for matrix 1000 x 300, repeated 10 times
Experiment for matrix 1000 x 1000, repeated 10 times
Experiment for matrix 2000 x 100, repeated 10 times
Experiment for matrix 2000 x 200, repeated 10 times
Experiment for matrix 2000 x 600, repeated 10 times
Experiment for matrix 2000 x 2000, repeated 10 times
Experiment for matrix 4000 x 200, repeated 10 times
Experiment for matrix 4000 x 400, repeated 10 times
Experiment for matrix 4000 x 1200, repeated 10 times
Experiment for matrix 4000 x 4000, repeated 10 times
Experiment for matrix 7500 x 375, repeated 10 times
Experiment for matrix 7500 x 750, repeated 10 times
Experiment for matrix 7500 x 2250, repeated 10 times
Experiment for matrix 7500 x 7500, repeated 5 times
Experiment for matrix 10000 x 500, repeated 10 times
Experiment for matrix 10000 x 1000, repeated 10 times
Experiment for matrix 10000 x 3000, repeated 5 times
Experiment for matrix 10000 x 10000, repeated 5 times

The env I used is quite old already:

numpy                     1.19.1           py36h5510c5b_0
numpy-base                1.19.1           py36ha3acd2a_0
python                    3.6.8                h9f7ef89_7
scipy                     1.5.0            py36h9439919_0

(conda 4.8.5 on windows 10)

Note that I did not use the 'raw' mode since it makes the resulting matrix non-numpy compliant if I understand correctly. You might wish to change this to see the effect.

This investigation came from a scikit-learn experiment, see this ticket.

Upvotes: 1

Related Questions