Reputation: 155
I'm trying to understand how to best utilize the C-ordering of numpy arrays to write high performance code. My expectation was that operations which traverse rows should be faster than those which traverse columns. Indeed, this was true for the first example I tried:
X = np.ones((10000,10000),dtype='int64')
print(X.dtype)
print(X.flags)
%timeit np.sum(X,axis=0)
%timeit np.sum(X,axis=1)
This produces output:
int64
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
10 loops, best of 3: 79.6 ms per loop
10 loops, best of 3: 61.1 ms per loop
Which is what I expected, since summing along the rows should be faster than summing along the columns.
Here is where I get very confused. If I change the dtype to float64, then column operations become almost twice as fast as row operations:
X = np.ones((10000,10000),dtype='float')
print(X.dtype)
print(X.flags)
%timeit np.sum(X,axis=0)
%timeit np.sum(X,axis=1)
Produces output:
float64
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
10 loops, best of 3: 67.7 ms per loop
10 loops, best of 3: 123 ms per loop
Can someone please clarify why this is happening?
EDIT: It was suggested in the comments that I try again with a smaller matrix, (1000,1000). When I run:
import time
import numpy as np
X = np.ones((1000,1000),dtype='float')
print(X.dtype)
print(X.flags)
%timeit np.sum(X,axis=0)
%timeit np.sum(X,axis=1)
X = np.ones((1000,1000),dtype='int64')
print(X.dtype)
print(X.flags)
%timeit np.sum(X,axis=0)
%timeit np.sum(X,axis=1)
I get output:
float64
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
1000 loops, best of 3: 598 µs per loop
1000 loops, best of 3: 1.06 ms per loop
int64
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
1000 loops, best of 3: 788 µs per loop
1000 loops, best of 3: 632 µs per loop
So the effect is persisting.
Upvotes: 3
Views: 458
Reputation: 8378
I cannot confirm your second result on OSX (various Python versions) - it is similar to your first result:
In [27]: X = np.ones((10000,10000),dtype='float64')
...: print(X.dtype)
...: print(X.flags)
...:
...: %timeit np.sum(X,axis=0)
...:
...: %timeit np.sum(X,axis=1)
...:
float64
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
10 loops, best of 3: 67.6 ms per loop
10 loops, best of 3: 62 ms per loop
EDIT: I repeated all your computations using directly timeit.repeat()
:
import timeit
t = timeit.repeat('np.sum(X,axis=0)', setup="import numpy as np; X = np.ones((10000,10000),dtype='float64')", repeat=50, number=1); print(min(t));
t = timeit.repeat('np.sum(X,axis=1)', setup="import numpy as np; X = np.ones((10000,10000),dtype='float64')", repeat=50, number=1); print(min(t));
t = timeit.repeat('np.sum(X,axis=0)', setup="import numpy as np; X = np.ones((10000,10000),dtype='int64')", repeat=50, number=1); print(min(t));
t = timeit.repeat('np.sum(X,axis=1)', setup="import numpy as np; X = np.ones((10000,10000),dtype='int64')", repeat=50, number=1); print(min(t));
with these timings:
Python 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:05:08)
IPython 5.3.0 -- An enhanced Interactive Python.
numpy 1.12.1
0.0637669563293 # float64, axis=0
0.0558688640594 # float64, axis=1
0.0669782161713 # int64, axis=0
0.0576930046082 # int64, axis=1
and
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59)
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.
numpy 1.13.1
0.06289491400821134
0.05558946297969669
0.0670205659698695
0.057950171001721174
and
Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar 6 2017, 12:15:08)
IPython 5.3.0 -- An enhanced Interactive Python.
numpy 1.11.3
0.06345970398979262
0.05561513203429058
0.07043616304872558
0.057934076990932226
Finally, on my Android phone:
Python 3.6.2 (default, Jul 19 2017, 11:01:41)
IPython 6.1.0
numpy 1.12.0
0.39130385394673795
0.24979593697935343
0.42852322908584028
0.28863119706511497
and Windows system (python 3.4 32bit):
0.158213707338
0.149441164907
0.365552662475
0.128456460354
Upvotes: 1