LetsPlayYahtzee
LetsPlayYahtzee

Reputation: 7581

What's the time complexity of indexing a numpy array directly

I assume when having a numpy array, let's say

>>>>nArray
array([[  23425.     ,  521331.40625],
       [  23465.     ,  521246.03125],
       [  23505.     ,  528602.8125 ],
       [  23545.     ,  531934.75   ],
       [  23585.     ,  534916.375  ],
       [  23865.     ,  527971.1875 ]])

direct indexing must be pretty efficient.

I imagine something like that nArray[0, 1] = 69696420 must be using a hash-table which will give a time complexity close to O(1). Is that right?

update

As both answers noted, there is no hashing involved in indexing a numpy array. Both answers give a clear explanation about how the indexing happens.

update 2

I added a simple benchmarking to prove the validity of the answers

Upvotes: 11

Views: 10347

Answers (3)

Ami Tavory
Ami Tavory

Reputation: 76317

On the one hand

must be using a hash-table which will give a time complexity close to O(1). Is that right?

is not quite true. Numpy arrays are basically contiguous blocks of homogeneous memory, with some extra info on the side on dimensions and such. Therefore, the access is O(1), and just involves some trivial math to determine the position within the memory.

On the other hand

indexing must be pretty efficient.

is unfortunately not true at all. Asides from bounds checking (which arrays do), everything involving pure python is extremely inefficient (and accesses involve pure-python calls). Numpy array access is no exception. You should try to use vector operations whenever possible.

Upvotes: 10

LetsPlayYahtzee
LetsPlayYahtzee

Reputation: 7581

To add some extra validation through testing to Ami's Answer I made a simple circular buffer from a numpy array that only uses direct indexing to make insertions. Basically each insertion just changes the values of the oldest element in the queue.

The code is not completely bug free but it can serve as a basis for some simple performance benchmarking.

import math
import numpy as np


class CircFifo():
"""
helper class, uses a numpy array to provide a circular fixed size fifo
interface.

put(element): removes the oldest element and
places a new one

get(): returns the oldest entry

empty(): returns true if fifo is empty

full():  returns true if fifo is full
"""
def __init__(self, size):
    self.array = np.empty(shape=(size, 2))
    self.size = size
    self.array[:] = np.NAN
    self.top = 0
    self.bottom = 0

def put(self, row):
    self.array[self.top, :] = row
    self.top += 1
    if self.top == self.size:
        self.top = 0

def get(self):
    if not math.isnan(self.array[self.bottom, 0]):
        row = copy.deepcopy(self.array[self.bottom, :])
        self.array[self.bottom, :] = float('NaN')
        self.bottom += 1
    if self.bottom == self.size:
        self.bottom = 0
    if math.isnan(self.array[self.bottom, 0]):
        self.bottom = 0
        self.top = 0
    return row

def empty(self):
    if math.isnan(self.array[self.bottom, 0]):
        return True
    else:
        return False

def full(self):
    if self.size - np.count_nonzero(
            np.isnan(self.array[:, 0])) == self.size:
        return True
    else:
        return False

The correctness of the answers in the post seems to be confirmed through a simple test that I run. I tested the insertion performance against a deque object. Even for 1000 insertions deque, which also servers as a dynamic and not static data structure (opposed to my static circular fifo), clearly outperforms the circular fifo.

Here is the simple test I run

In [5]: import time

In [6]: circFifo = CircFifo(300)

In [7]: elapsedTime = 0

In [8]: for i in range(1, 1000):
   ...:         start = time.time()
   ...:         circFifo.put(np.array([[52, 12]]))
   ...:         elapsedTime += time.time() - start
   ...:     

In [9]: elapsedTime
Out[9]: 0.010616540908813477


In [21]: queue = deque()

In [22]: elapsedTime = 0

In [23]: for i in range(1, 1000):
   ....:         start = time.time()
   ....:         queue.append(np.array([[52, 12]]))
   ....:         elapsedTime += time.time() - start
   ....:     

In [24]: elapsedTime
Out[24]: 0.00482630729675293

I know that this benchmark is not that informative but it becomes quite apparent that deque is much faster. For at least that amount of insertions.

I would expect that if that circular fifo was implemented in C with a static array it could not be easily outperformed. Since basically C's static array is the simplest and with less overhead data structure available.

Upvotes: 2

Dietrich Epp
Dietrich Epp

Reputation: 213378

There is no hash table involved. Numpy arrays are arrays, just like the name implies, and the address is computed like this:

address of nArray[x, y] = base address + A * x + B * y

Upvotes: 4

Related Questions