jmracek
jmracek

Reputation: 155

How does dtype affect row and column operation speed in Numpy?

I'm trying to understand how to best utilize the C-ordering of numpy arrays to write high performance code. My expectation was that operations which traverse rows should be faster than those which traverse columns. Indeed, this was true for the first example I tried:

X = np.ones((10000,10000),dtype='int64')
print(X.dtype)
print(X.flags)

%timeit np.sum(X,axis=0)

%timeit np.sum(X,axis=1)

This produces output:

int64
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
10 loops, best of 3: 79.6 ms per loop
10 loops, best of 3: 61.1 ms per loop

Which is what I expected, since summing along the rows should be faster than summing along the columns.

Here is where I get very confused. If I change the dtype to float64, then column operations become almost twice as fast as row operations:

X = np.ones((10000,10000),dtype='float')
print(X.dtype)
print(X.flags)

%timeit np.sum(X,axis=0)

%timeit np.sum(X,axis=1)

Produces output:

float64
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
10 loops, best of 3: 67.7 ms per loop
10 loops, best of 3: 123 ms per loop

Can someone please clarify why this is happening?

EDIT: It was suggested in the comments that I try again with a smaller matrix, (1000,1000). When I run:

import time
import numpy as np

X = np.ones((1000,1000),dtype='float')
print(X.dtype)
print(X.flags)

%timeit np.sum(X,axis=0)
%timeit np.sum(X,axis=1)

X = np.ones((1000,1000),dtype='int64')
print(X.dtype)
print(X.flags)

%timeit np.sum(X,axis=0)
%timeit np.sum(X,axis=1)

I get output:

float64
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
1000 loops, best of 3: 598 µs per loop
1000 loops, best of 3: 1.06 ms per loop
int64
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
1000 loops, best of 3: 788 µs per loop
1000 loops, best of 3: 632 µs per loop

So the effect is persisting.

Upvotes: 3

Views: 458

Answers (1)

AGN Gazer
AGN Gazer

Reputation: 8378

I cannot confirm your second result on OSX (various Python versions) - it is similar to your first result:

In [27]: X = np.ones((10000,10000),dtype='float64')
    ...: print(X.dtype)
    ...: print(X.flags)
    ...: 
    ...: %timeit np.sum(X,axis=0)
    ...: 
    ...: %timeit np.sum(X,axis=1)
    ...: 
float64
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
10 loops, best of 3: 67.6 ms per loop
10 loops, best of 3: 62 ms per loop

EDIT: I repeated all your computations using directly timeit.repeat():

import timeit
t = timeit.repeat('np.sum(X,axis=0)', setup="import numpy as np; X = np.ones((10000,10000),dtype='float64')", repeat=50, number=1); print(min(t));
t = timeit.repeat('np.sum(X,axis=1)', setup="import numpy as np; X = np.ones((10000,10000),dtype='float64')", repeat=50, number=1); print(min(t));
t = timeit.repeat('np.sum(X,axis=0)', setup="import numpy as np; X = np.ones((10000,10000),dtype='int64')", repeat=50, number=1); print(min(t));
t = timeit.repeat('np.sum(X,axis=1)', setup="import numpy as np; X = np.ones((10000,10000),dtype='int64')", repeat=50, number=1); print(min(t));

with these timings:

Python 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:05:08) 
IPython 5.3.0 -- An enhanced Interactive Python.
numpy 1.12.1

0.0637669563293 # float64, axis=0
0.0558688640594 # float64, axis=1
0.0669782161713 # int64, axis=0
0.0576930046082 # int64, axis=1

and

Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59) 
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.
numpy 1.13.1

0.06289491400821134
0.05558946297969669
0.0670205659698695
0.057950171001721174

and

Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar  6 2017, 12:15:08) 
IPython 5.3.0 -- An enhanced Interactive Python.
numpy 1.11.3

0.06345970398979262
0.05561513203429058
0.07043616304872558
0.057934076990932226

Finally, on my Android phone:

Python 3.6.2 (default, Jul 19 2017, 11:01:41)
IPython 6.1.0
numpy 1.12.0

0.39130385394673795
0.24979593697935343
0.42852322908584028
0.28863119706511497

and Windows system (python 3.4 32bit):

0.158213707338
0.149441164907
0.365552662475
0.128456460354

Upvotes: 1

Related Questions