Ruggero Turra
Ruggero Turra

Reputation: 17680

fastest list index searching

What is the fastest way to find the index of an element in a list of integers?

Now I am doing

if value in mylist:
    return mylist.index(value)

but it seems I am doing the same thing two times: to know if value is in mylist I also know the index position. I tried also other solutions:

try:
    return mylist.index(value)
except ValueError:
    return None

or

for i, x in enumerate(mylist):
    if x == value:
         return i
return None

but all these solutions seems to be slower.

The array is non-sorted with only 4 elements.

Upvotes: 3

Views: 730

Answers (3)

MAK
MAK

Reputation: 26586

Just using if-elses is fast, but if you are always searching on the same list (or your list doesn't change too often) you can be a bit faster by storing the element -> index mapping in a dict and then doing a dictionary lookup.

So your code should look something like this:

# Precompute the mapping.
mapping = { index: value for value, index in enumerate(TEST_LIST) }

# Search function:
def lookup(value):
  return mapping.get(value, None)

I ran some tests comparing this with other approaches. Here's my test code:

import timeit

TEST_LIST = [100, -2, 10007, 2**70 + 1]
mapping = { index: value for value, index in enumerate(TEST_LIST) }
NUM_TIMES = 10**6


def by_if_else(lst, value):
  if lst[0] == value:
    return 0
  elif lst[1] == value:
    return 1
  elif lst[2] == value:
    return 2
  elif lst[3] == value:
    return 3
  else:
    return None


def by_index(lst, value):
  for i in xrange(4):
    if lst[i] == value:
      return i
  return None


def by_exception(lst, value):
  try:
    lst.index(value)
  except ValueError:
    return None


def by_iter(lst, value):
  for index, element in enumerate(lst):
    if element == value:
      return value
  return None

def by_dict(lst, value):
  return mapping.get(value, None)


def TimeFunction(function_name, value):
  if 'dict' in function_name:
    return timeit.timeit(
        stmt = '%s(mapping, %d)' % (function_name, value),
        setup = 'from __main__ import %s, mapping' % function_name,
        number=NUM_TIMES)
  else:
    return timeit.timeit(
        stmt = '%s(TEST_LIST, %d)' % (function_name, value),
        setup = 'from __main__ import %s, TEST_LIST' % function_name,
        number=NUM_TIMES)


def RunTestsOn(value):
  print "Looking for %d in %s" % (value, str(TEST_LIST))
  function_names = [name for name in globals() if name.startswith('by_')]
  for function_name in function_names:
    print "Function: %s\nTime: %f" % (
        function_name, TimeFunction(function_name, value))



def main():
  values_to_look_for = TEST_LIST + [ -10**70 - 1, 55, 29]
  for value in values_to_look_for:
    RunTestsOn(value)


if __name__ == '__main__':
  main()

It looks like the if-else approach is faster when the values being searched for are small and are present in the list (I removed runtimes for the other functions):

Looking for 10007 in [100, -2, 10007, 1180591620717411303425L]
Function: by_dict
Time: 0.213232
Function: by_if_else
Time: 0.181917

But slower if the value is large (i.e. comparison in expensive):

Looking for 1180591620717411303425 in [100, -2, 10007, 1180591620717411303425L]
Function: by_dict
Time: 0.223594
Function: by_if_else
Time: 0.380222

Or, when the value isn't present in the list at all (even if the value is small):

Looking for 29 in [100, -2, 10007, 1180591620717411303425L]
Function: by_dict
Time: 0.195733
Function: by_if_else
Time: 0.267689

While it is obvious that using a dict should be faster due to queries on it being O(1) as opposed to O(n) for all the other approaches, for such a small list, the interpreter is probably creating optimized bytecode for the if-else version and the overhead of doing pointer chases through a hashtable offsetting a lot of the speed advantage of the dict. But it still appears to be slightly faster most of the time. I would suggest you test out this approach on your data and see which works better for you.

Upvotes: 1

Padraic Cunningham
Padraic Cunningham

Reputation: 180411

You can use a set to check for membership, it will be more efficient than checking the list but the greatest overhead is the indexing:

In [54]: l = [1,2,3,4]

In [55]: s = set([1,2,3,4])

In [56]: timeit l.index(6)  if 6 in s else False
10000000 loops, best of 3: 79.9 ns per loop

In [57]: timeit l.index(6)  if 6 in l else False
10000000 loops, best of 3: 141 ns per loop

In [58]: timeit l.index(4)  if 4 in l else False

1000000 loops, best of 3: 381 ns per loop

In [59]: timeit l.index(4)  if 4 in s else False
1000000 loops, best of 3: 333 ns per loop

Upvotes: 1

Tamim Shahriar
Tamim Shahriar

Reputation: 739

As you have only four items, you can also try this :

 if value == mylist[0]:
   return 0
 elif value == mylist[1]:
   return 1
 elif value == mylist[2]:
   return 2
 elif value == mylist [3]:
   return 3

Let me know how it works in your case. I am curious. :)

Upvotes: 16

Related Questions