Reputation: 85
I have been using Fortran for my computational physics related work for a long time, and recently started learning and playing around with Python. I am aware of the fact that being an interpreted language Python is generally slower than Fortran for primarily CPU-intensive computational work. But then I thought using numpy would significantly improve the performance for a simple task like sorting.
So my test case was sorting an array/a list of size 10,000 containing random floats using bubble sort (just a test case with many array operations, so no need to comment on the performance of the algorithm itself). My timing results are as follows (all functions use identical algorithm):
Python3 (using numpy array, but my own function instead of numpy.sort): 33.115s
Python3 (using list): 9.927s
Fortran (gfortran) : 0.291s
Python3 (using numpy.sort): 0.269s (not a fair comparison, since it uses a different algorithm)
I was surprised that operating with numpy array is ~3 times slower than with python list, and ~100 times slower than Fortran. So at this point my questions are:
Upvotes: 5
Views: 7888
Reputation: 7293
- Why operating with numpy array is significantly slower than python list for this test case?
NumPy arrays are container for numerical data. They contain metadata (the type and shape of the array) and a block of memory for the data itself. As such, any operation on the elements of a NumPy array involve some overhead. Python lists are better optimized for "plain Python" code: reading or writing to a list element is faster than it is for a NumPy array. The benefit of NumPy array comes from "whole array operations" (so called array operations) and from compiled extensions. C/C++/Fortran, Cython or Numba can access the content of NumPy arrays without overhead.
- In case an algorithm that I need is not already implemented in scipy/numpy, and I need to write my own function within Python framework with best performance in mind, which data type I should operate with: numpy array or list?
For numerical code, NumPy arrays are better: you can access their content "à la C" or "à la Fortran" in compiled extensions.
- If my applications are performance oriented, and I want to write functions with equivalent performance as in-built numpy functions (e.g. np.sort), what tools/framework I should learn/use?
There are many. You can write in C using the NumPy C-API (complicated, I wouldn't advise it but it is good to know that it exists). Cython is a mature "Python-like" and "C-like" language that enables an easy and progressive performance improvements. Numba is a drop-in "just in time" interpreter: provided that you restrict your code to direct numerical operations on NumPy arrays, Numba will convert the code on the fly to a compiled equivalent.
Upvotes: 5
Reputation: 36765
You seem to be misunderstanding what NumPy does to speed up computations.
The speedup you get in NumPy does not come from NumPy using some smart way of saving data. Or compiling your Python code to C automatically.
Instead, NumPy implements many useful algorithms in C or Fortran, numpy.sort()
being one of them. These functions understand np.ndarray
s as input and loop over the data in a C/Fortran loop.
If you want to write fast NumPy code there are really three ways to do that:
np.ndarray
s (internally they're a contiguous array of the type you've chosen).Upvotes: 8