dg99
dg99

Reputation: 5663

Efficient multiple, arbitrary index access in Python tuple?

I have a long Python tuple t. I would like to grab the elements at indices i1, i2, ..., iN from t as efficiently as possible. What's the best way?

One approach is:

(1)    result = [t[j] for j in (i1, i2, ..., iN)]

but this would seem to cause N separate lookups into the tuple. Is there a faster way? When Python does slices like this:

(2)    result = t[1:M:3]

I assume that it does not perform M/3 separate lookups. (Maybe it uses a bitmask and does a single copy operation?) Is there some way for me to capitalize on whatever Python does in (2) to make my arbitrary-index slice happen in a single copy?

Thanks.

Upvotes: 4

Views: 6861

Answers (5)

John La Rooy
John La Rooy

Reputation: 304147

If you are doing a bunch of identical lookups, it may be worth using an itemgetter

from operator import itemgetter
mygetter = itemgetter(i1, i2, ..., iN)
for tup in lots_of_tuples:
    result = mygetter(tup)

For one off, the overhead of creating the itemgetter is not worthwhile

Quick test in iPython shows:

In [1]: import random

In [2]: from operator import itemgetter

In [3]: t=tuple(range(1000))

In [4]: idxs = tuple(random.randrange(1000) for i in range(20))

In [5]: timeit [t[i] for i in idxs]
100000 loops, best of 3: 2.09 us per loop

In [6]: mygetter = itemgetter(*idxs)

In [7]: timeit mygetter(t)
1000000 loops, best of 3: 596 ns per loop

Obviously the difference will depend on the length of the tuple, the number of indices, etc.

Upvotes: 8

Ned Batchelder
Ned Batchelder

Reputation: 375574

1) Are you sure you need the operation to go faster?

2) Another option is operator.itemgetter: It returns a tuple picked by its indexes:

>>> t = tuple(string.ascii_uppercase)
>>> operator.itemgetter(13,19,4,21,1)(t)
('N', 'T', 'E', 'V', 'B')

The operator module is implemented in C, so will likely outperform a Python loop.

Upvotes: 0

Rosh Oxymoron
Rosh Oxymoron

Reputation: 21055

The one you've listed is the most optimal way to get the elements from a tuple. You usually don't care about the performance in such expressions – it's a premature optimisation, and even if you did, such operations are already too slow even with the optimisations, i.e. if you optimise the access the loop itself will still be slow due to reference counting of the temporary variables and etc.

If you already have a performance issue or this is already part of CPU-heavy code you can try several alternatives:

1) numpy arrays:

>>> arr = np.array(xrange(2000))
>>> mask = np.array([True]*2000)
>>> mask = np.array([False]*2000)
>>> mask[3] = True
>>> mask[300] = True
>>> arr[mask]
array([  3, 300])

2) You can use the C API to copy the elements using PyTuple_GET_ITEM which accesses the internal array directly, but be warned that using the C API is not trivial and can introduce a lot of bugs.

3) You can use C arrays with the C API, using e.g. the buffer interface of array.array to glue the data access to Python.

4) You can use Cython with C arrays and a custom Cython type for data access from Python.

5) You can use Cython and numpy together.

Upvotes: 2

Mark Ransom
Mark Ransom

Reputation: 308130

Slicing can be more efficient because it has more constraints: the index must proceed in a linear fashion by a fixed amount. The list comprehension could be completely random so no optimization is possible.

Still it's dangerous to make assumptions about efficiency. Try timing both ways and see if there's a significant difference.

Upvotes: 0

steveha
steveha

Reputation: 76695

Inside the list comprehension there is an implicit for loop, and I am pretty sure it is iterating through the tuple values with reasonable efficiency. I don't think you can improve on the list comprehension for efficiency.

If you just need the values you might be able to use a generator expression and avoid building the list, for a slight savings in time or memory.

Upvotes: 0

Related Questions