ymc
ymc

Reputation: 77

How come a numpy array created from list can be 10x faster than that from np.zeros for incrementation?

I am using python 3.8 and numpy 1.17.4. The output of the following piece of code

import time
import sys
import numpy as np

if __name__ == '__main__':
  li = np.zeros(5000000,dtype=int)
  sys.stdout.write("%s %s\n" % (type(li),type(li[0])))
  start = time.process_time()
  li += 5
  sys.stdout.write("%.6fs\n" % (time.process_time()-start))
  li = np.zeros(5000000,dtype=int)
  li = list(li)
  li = np.array(li)
  sys.stdout.write("%s %s\n" % (type(li),type(li[0])))
  start = time.process_time()
  li += 5
  sys.stdout.write("%.6fs\n" % (time.process_time()-start))

looks like

<class 'numpy.ndarray'> <class 'numpy.int64'>
0.037046s
<class 'numpy.ndarray'> <class 'numpy.int64'>
0.003537s

How come the latter is 10x faster to increment?

Upvotes: 2

Views: 106

Answers (2)

XtianP
XtianP

Reputation: 389

It seems to me that the explanation is the same as here. Indeed, numpy.zeros() seems to be a "lazy" operation. I modified your sample twice:

  1. I removed the two statements implying lists before the second measure => the two measures became identical
  2. I added a dummy operation in order to force numpy.zero() to really allocate the memory => the two measures become different again:
import time
import sys
import numpy as np

if __name__ == '__main__':
  li = np.zeros(5000000,dtype=int)
  sys.stdout.write("%s %s\n" % (type(li),type(li[0])))
  start = time.process_time()
  li += 5
  sys.stdout.write("%.6fs\n" % (time.process_time()-start))
  li = np.zeros(5000000,dtype=int)
  li += 0 #dummy operation 
  sys.stdout.write("%s %s\n" % (type(li),type(li[0])))
  start = time.process_time()
  li += 5
  sys.stdout.write("%.6fs\n" % (time.process_time()-start))

Upvotes: 1

gofvonx
gofvonx

Reputation: 1439

Your time estimate may not be accurate. It uses a single iteration of the command which may be influenced by other factors. Using %timeit shows somewhat more consistent results even though the second method has a much higher standard deviation.

First method:

li = np.zeros(5000000,dtype=int)

%timeit zi = li + 5
12 ms ± 179 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Second method:

li = np.zeros(5000000,dtype=int)
li = list(li)
li = np.array(li)

%timeit zi = li + 5
13.3 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 0

Related Questions