CaptainCodeman
CaptainCodeman

Reputation: 2201

Why is numpy list access slower than vanilla python?

I was under the impression that numpy would be faster for list operations, but the following example seems to indicate otherwise:

import numpy as np
import time

def ver1():
    a = [i for i in range(40)]
    b = [0 for i in range(40)]
    for i in range(1000000):
        for j in range(40):
            b[j]=a[j]

def ver2():
    a = np.array([i for i in range(40)])
    b = np.array([0 for i in range(40)])
    for i in range(1000000):
        for j in range(40):
            b[j]=a[j]

t0 = time.time()
ver1()
t1 = time.time()
ver2()
t2 = time.time()

print(t1-t0)
print(t2-t1)

Output is:

4.872278928756714
9.120521068572998

(I'm running 64-bit Python 3.4.3 in Windows 7, on an i7 920)

I do understand that this isn't the fastest way to copy a list, but I'm trying to find out if I'm using numpy incorrectly. Or is it the case that numpy is slower for this kind of operation and is only more efficient in more complex operations?

EDIT:

I also tried the following, which just just does a direct copy via b[:] = a, and numpy is still twice as slow:

import numpy as np
import time

def ver6():
    a = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    b = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    for i in range(1000000):
        b[:] = a

def ver7():
    a = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
    b = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
    for i in range(1000000):
        b[:] = a

t0 = time.time()
ver6()
t1 = time.time()
ver7()
t2 = time.time()

print(t1-t0)
print(t2-t1)

Output is:

0.36202096939086914
0.6750380992889404

Upvotes: 5

Views: 1435

Answers (2)

Jaime
Jaime

Reputation: 67417

Most of what you are seeing is Python object creation from C native types.

A Python list is, at it's heart, an array of PyObject pointers. When a and b are both Python lists, doing b[i] = a[i] will imply:

  • decreasing the reference count of the object pointed by b[i],
  • increasing the reference count of the object pointed by a[i], and
  • copying the address stored in a[i] into b[i].

But if a and b are NumPy arrays, things are a little more ellaborate, and the same b[i] = a[i] then requires:

  • creating a Python integer object from the native C integer type stored at a[i], see this,
  • converting the Python integer object into a native C integer type, and storing its value in b[i], see here, and
  • decreasing the reference count of the temporary Python integer object.

So the difference is mostly in creating and disposing of that intermediate Python object, that lists do not need to do.

Upvotes: 1

user2357112
user2357112

Reputation: 280182

You're using NumPy wrong. NumPy's efficiency relies on doing as much work as possible in C-level loops instead of interpreted code. When you do

for j in range(40):
    b[j]=a[j]

That's an interpreted loop, with all the intrinsic interpreter overhead and more, because NumPy's indexing logic is way more complex than list indexing, and NumPy needs to create a new element wrapper object on every element retrieval. You're not getting any of the benefits of NumPy when you write code like this.

You need to write the code in such a way that the work happens in C:

b[:] = a

This would also improve the efficiency of the list operation, but it's much more important for NumPy.

Upvotes: 6

Related Questions