Reputation: 35255
index()
will give the first occurrence of an item in a list. Is there a neat trick which returns all indices in a list for an element?
Upvotes: 610
Views: 847780
Reputation: 36516
If you need to perform multiple lookups on a list, it may be more efficient to compile a dictionary with all of the positions of every distinct element. At that point, any single lookup is a constant-time operation.
To do this we can:
enumerate
provides this quite readily.itertools.groupby
to group those matching elements together.groupby
object to build a dictionary.from itertools import groupby
from operator import itemgetter
def build_table(lst):
snd = itemgetter(1)
e = enumerate(lst)
s = sorted(e, key=snd)
g = groupby(s, key=snd)
return {
k: [x[0] for x in v]
for k, v in g
}
Upvotes: 0
Reputation: 4795
If you are using Python 2, you can achieve the same functionality with this:
def f(my_list, value):
return filter(lambda x: my_list[x] == value, range(len(my_list)))
Where my_list
is the list you want to get the indexes of, and value
is the value searched. Usage:
f(some_list, some_element)
Upvotes: 1
Reputation: 287835
def occurrences(s, lst):
return (i for i,e in enumerate(lst) if e == s)
list(occurrences(1, [1,2,3,1])) # = [0, 3]
Upvotes: 12
Reputation: 62403
while-loop
in this answer is the fastest implementation tested.
test2()
below.np.where
to find the indices of a single value, which is not faster than a list-comprehension, if the time to convert a list to an array is includednumpy
and converting a list
to a numpy.array
probably makes using numpy
a less efficient option for most circumstances. A careful timing analysis would be necessary.
list
, converting the list
to an array
, and then using numpy
functions will likely be a faster option.np.where
and np.unique
to find the indices of all unique elements in a list.
np.where
on an array (including the time to convert the list to an array) is slightly slower than a list-comprehension on a list, for finding all indices of all unique elements.numpy
on an array can be found in Get a list of all indices of repeated elements in a numpy array[python 3.10.4, numpy 1.23.1]
and [python 3.11.0, numpy 1.23.4]
import numpy as np
import random # to create test list
# create sample list
random.seed(365)
l = [random.choice(['s1', 's2', 's3', 's4']) for _ in range(20)]
# convert the list to an array for use with these numpy methods
a = np.array(l)
# create a dict of each unique entry and the associated indices
idx = {v: np.where(a == v)[0].tolist() for v in np.unique(a)}
# print(idx)
{'s1': [7, 9, 10, 11, 17],
's2': [1, 3, 6, 8, 14, 18, 19],
's3': [0, 2, 13, 16],
's4': [4, 5, 12, 15]}
%timeit
on a 2M element list with 4 unique str
elements# create 2M element list
random.seed(365)
l = [random.choice(['s1', 's2', 's3', 's4']) for _ in range(2000000)]
def test1():
# np.where: convert list to array and find indices of a single element
a = np.array(l)
return np.where(a == 's1')
def test2():
# list-comprehension: on list l and find indices of a single element
return [i for i, x in enumerate(l) if x == "s1"]
def test3():
# filter: on list l and find indices of a single element
return list(filter(lambda i: l[i]=="s1", range(len(l))))
def test4():
# use np.where and np.unique to find indices of all unique elements: convert list to array
a = np.array(l)
return {v: np.where(a == v)[0].tolist() for v in np.unique(a)}
def test5():
# list comprehension inside dict comprehension: on list l and find indices of all unique elements
return {req_word: [idx for idx, word in enumerate(l) if word == req_word] for req_word in set(l)}
def get_indices1(x: list, value: int) -> list:
indices = list()
for i in range(len(x)):
if x[i] == value:
indices.append(i)
return indices
def get_indices2(x: list, value: int) -> list:
indices = list()
i = 0
while True:
try:
# find an occurrence of value and update i to that index
i = x.index(value, i)
# add i to the list
indices.append(i)
# advance i by 1
i += 1
except ValueError as e:
break
return indices
%timeit test1() # list of indices for specified value
%timeit test2() # list of indices for specified value
%timeit test3() # list of indices for specified value
%timeit test4() # dict of indices of all values
%timeit test5() # dict of indices of all values
%timeit get_indices1(l, 's1') # list of indices for specified value
%timeit get_indices2(l, 's1') # list of indices for specified value
python 3.12.0
209 ms ± 2.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
78.5 ms ± 733 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
125 ms ± 757 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
340 ms ± 8.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
319 ms ± 2.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
74.9 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
58.2 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 11
Reputation: 62403
for-loop
:enumerate
and a list comprehension are more pythonic, but not necessarily faster. However, this answer is aimed at students who may not be allowed to use some of those built-in functions.indices
for i in range(len(x)):
, which essentially iterates through a list of index locations [0, 1, 2, 3, ..., len(x)-1]
i
, where x[i]
is a match to value
, to indices
def get_indices(x: list, value: int) -> list:
indices = list()
for i in range(len(x)):
if x[i] == value:
indices.append(i)
return indices
n = [1, 2, 3, -50, -60, 0, 6, 9, -60, -60]
print(get_indices(n, -60))
>>> [4, 8, 9]
get_indices
, are implemented with type hints. In this case, the list, n
, is a bunch of int
s, therefore we search for value
, also defined as an int
.while-loop
and .index
:.index
, use try-except
for error handling, because a ValueError
will occur if value
is not in the list
.def get_indices(x: list, value: int) -> list:
indices = list()
i = 0
while True:
try:
# find an occurrence of value and update i to that index
i = x.index(value, i)
# add i to the list
indices.append(i)
# advance i by 1
i += 1
except ValueError as e:
break
return indices
print(get_indices(n, -60))
>>> [4, 8, 9]
Upvotes: 3
Reputation: 81
The simplest and far fastest I have found is to recreate as a dict that you can then loop over i.e.
l_full_list_w_duplicates=[...]
l_full_list_wo_duplicates=list(set(l_full_list_w_duplicates))
dict_location={}
for i in l_full_list_wo_duplicates:
dict_location[i]=[]
i_counter=0
for iin l_full_list_w_duplicates:
dict_location[i]+=[i_counter]
i_counter+=1
This runs in linear time and is very scalable
Upvotes: -1
Reputation: 22443
A very old question but I don't see this wrinkle in the answers, altough it is alluded to.
Using the accepted answer's enumerate
but with the in
option, allows a string search of a list, with the added bonus of partial hits and case insensitivity, should you so choose.
Providing either index positions or the found string itself.
Given a list of iso country names, the following gives an idea of its use
>>> query = "Tre".casefold()
>>>
>>> print([i for i, x in enumerate(country_list) if query in x.casefold()])
[280, 303, 352, 489]
>>>
>>> print([x for i, x in enumerate(country_list) if query in x.casefold()])
['Eritrea', 'Spain Extremadura', 'France Centre-Val de Loire', 'Italy Trentino-South Tyrol']
>>>
>>> country_list[280]
'Eritrea'
>>>
>>> country_list[489]
'Italy Trentino-South Tyrol'
>>> query = "out".casefold()
>>> print([i for i, x in enumerate(country_list) if query in x.casefold()])
[48, 54, 253, 421, 489, 651, 1075, 1099, 1147, 1240, 1242, 1248, 1306]
>>> print([x for i, x in enumerate(country_list) if query in x.casefold()])
['Australia New South Wales', 'Australia South Australia', 'Djibouti', 'South Georgia and Sandwich Islands', 'Italy Trentino-South Tyrol', 'South Korea', 'South Sudan', 'French Southern Territories', 'United States Outlying Islands', 'United States of America South Carolina', 'United States of America South Dakota', 'United States of America Outlying Islands', 'South Africa']
It's a simple case insensitive, partial string finder.
I suspect it's terribly inefficient on massive searches but for small ones it's easy and clean.
Upvotes: 0
Reputation: 172
Generators are fast and use a tiny memory footprint. They give you flexibility in how you use the result.
def indices(iter, val):
"""Generator: Returns all indices of val in iter
Raises a ValueError if no val does not occur in iter
Passes on the AttributeError if iter does not have an index method (e.g. is a set)
"""
i = -1
NotFound = False
while not NotFound:
try:
i = iter.index(val, i+1)
except ValueError:
NotFound = True
else:
yield i
if i == -1:
raise ValueError("No occurrences of {v} in {i}".format(v = val, i = iter))
The above code can be use to create a list of the indices: list(indices(input,value))
; use them as dictionary keys: dict(indices(input,value))
; sum them: sum(indices(input,value))
; in a for loop for index_ in indices(input,value):
; ...etc... without creating an interim list/tuple or similar.
In a for loop you will get your next index back when you call for it, without waiting for all the others to be calculated first. That means: if you break out of the loop for some reason you save the time needed to find indices you never needed.
.index
on the input iter
to find the next occurrence of
val
.index
to start at the point
after the last found occurrenceindex
raises a ValueError
I tried four different versions for flow control; two EAFP (using try - except
) and two TBYL (with a logical test in the while
statement):
while True:
... except ValueError: break
. Surprisingly, this was usually a touch slower than option 2 and (IMV) less readableerr
to identify when a ValueError
is raised. This is generally the fastest and more readable than 1while val in iter[i:]
. Unsurprisingly, this does not scale wellwhile i < last
The overall performance differences between 1,2 and 4 are negligible, so it comes down to personal style and preference. Given that .index
uses ValueError
to let you know it didn't find anything, rather than e.g. returning None
, an EAFP-approach seems fitting to me.
Here are the 4 code variants and results from timeit
(in milliseconds) for different lengths of input and sparsity of matches
@version("WhileTrueBreak", versions)
def indices2(iter, val):
i = -1
while True:
try:
i = iter.index(val, i+1)
except ValueError:
break
else:
yield i
@version("WhileErrFalse", versions)
def indices5(iter, val):
i = -1
err = False
while not err:
try:
i = iter.index(val, i+1)
except ValueError:
err = True
else:
yield i
@version("RemainingSlice", versions)
def indices1(iter, val):
i = 0
while val in iter[i:]:
i = iter.index(val, i)
yield i
i += 1
@version("LastOccurrence", versions)
def indices4(iter,val):
i = 0
last = len(iter) - tuple(reversed(iter)).index(val)
while i < last:
i = iter.index(val, i)
yield i
i += 1
Length: 100, Ocurrences: 4.0%
{'WhileTrueBreak': 0.0074799987487494946, 'WhileErrFalse': 0.006440002471208572, 'RemainingSlice': 0.01221001148223877, 'LastOccurrence': 0.00801000278443098}
Length: 1000, Ocurrences: 1.2%
{'WhileTrueBreak': 0.03101000329479575, 'WhileErrFalse': 0.0278000021353364, 'RemainingSlice': 0.08278000168502331, 'LastOccurrence': 0.03986000083386898}
Length: 10000, Ocurrences: 2.05%
{'WhileTrueBreak': 0.18062000162899494, 'WhileErrFalse': 0.1810499932616949, 'RemainingSlice': 2.9145700042136014, 'LastOccurrence': 0.2049500006251037}
Length: 100000, Ocurrences: 1.977%
{'WhileTrueBreak': 1.9361200043931603, 'WhileErrFalse': 1.7280600033700466, 'RemainingSlice': 254.4725100044161, 'LastOccurrence': 1.9101499929092824}
Length: 100000, Ocurrences: 9.873%
{'WhileTrueBreak': 2.832529996521771, 'WhileErrFalse': 2.9984100023284554, 'RemainingSlice': 1132.4922299943864, 'LastOccurrence': 2.6660699979402125}
Length: 100000, Ocurrences: 25.058%
{'WhileTrueBreak': 5.119729996658862, 'WhileErrFalse': 5.2082200068980455, 'RemainingSlice': 2443.0577100021765, 'LastOccurrence': 4.75954000139609}
Length: 100000, Ocurrences: 49.698%
{'WhileTrueBreak': 9.372120001353323, 'WhileErrFalse': 8.447749994229525, 'RemainingSlice': 5042.717969999649, 'LastOccurrence': 8.050809998530895}
Upvotes: 0
Reputation: 601599
You can use a list comprehension with enumerate
:
indices = [i for i, x in enumerate(my_list) if x == "whatever"]
The iterator enumerate(my_list)
yields pairs (index, item)
for each item in the list. Using i, x
as loop variable target unpacks these pairs into the index i
and the list item x
. We filter down to all x
that match our criterion, and select the indices i
of these elements.
Upvotes: 881
Reputation: 6025
A dynamic list comprehension based solution incase we do not know in advance which element:
lst = ['to', 'be', 'or', 'not', 'to', 'be']
{req_word: [idx for idx, word in enumerate(lst) if word == req_word] for req_word in set(lst)}
results in:
{'be': [1, 5], 'or': [2], 'to': [0, 4], 'not': [3]}
You can think of all other ways along the same lines as well but with index()
you can find only one index although you can set occurrence number yourself.
Upvotes: 3
Reputation: 764
Here is a time performance comparison between using np.where
vs list_comprehension
. Seems like np.where
is faster on average.
# np.where
start_times = []
end_times = []
for i in range(10000):
start = time.time()
start_times.append(start)
temp_list = np.array([1,2,3,3,5])
ixs = np.where(temp_list==3)[0].tolist()
end = time.time()
end_times.append(end)
print("Took on average {} seconds".format(
np.mean(end_times)-np.mean(start_times)))
Took on average 3.81469726562e-06 seconds
# list_comprehension
start_times = []
end_times = []
for i in range(10000):
start = time.time()
start_times.append(start)
temp_list = np.array([1,2,3,3,5])
ixs = [i for i in range(len(temp_list)) if temp_list[i]==3]
end = time.time()
end_times.append(end)
print("Took on average {} seconds".format(
np.mean(end_times)-np.mean(start_times)))
Took on average 4.05311584473e-06 seconds
Upvotes: -1
Reputation: 923
Using filter() in python2.
>>> q = ['Yeehaw', 'Yeehaw', 'Googol', 'B9', 'Googol', 'NSM', 'B9', 'NSM', 'Dont Ask', 'Googol']
>>> filter(lambda i: q[i]=="Googol", range(len(q)))
[2, 4, 9]
Upvotes: 4
Reputation: 23443
With enumerate(alist) you can store the first element (n) that is the index of the list when the element x is equal to what you look for.
>>> alist = ['foo', 'spam', 'egg', 'foo']
>>> foo_indexes = [n for n,x in enumerate(alist) if x=='foo']
>>> foo_indexes
[0, 3]
>>>
This function takes the item and the list as arguments and return the position of the item in the list, like we saw before.
def indexlist(item2find, list_or_string):
"Returns all indexes of an item in a list or a string"
return [n for n,item in enumerate(list_or_string) if item==item2find]
print(indexlist("1", "010101010"))
Output
[1, 3, 5, 7]
for n, i in enumerate([1, 2, 3, 4, 1]):
if i == 1:
print(n)
Output:
0
4
Upvotes: 6
Reputation: 71580
Or Use range
(python 3):
l=[i for i in range(len(lst)) if lst[i]=='something...']
For (python 2):
l=[i for i in xrange(len(lst)) if lst[i]=='something...']
And then (both cases):
print(l)
Is as expected.
Upvotes: 8
Reputation: 44485
more_itertools.locate
finds indices for all items that satisfy a condition.
from more_itertools import locate
list(locate([0, 1, 1, 0, 1, 0, 0]))
# [1, 2, 4]
list(locate(['a', 'b', 'c', 'b'], lambda x: x == 'b'))
# [1, 3]
more_itertools
is a third-party library > pip install more_itertools
.
Upvotes: 19
Reputation: 192
You can create a defaultdict
from collections import defaultdict
d1 = defaultdict(int) # defaults to 0 values for keys
unq = set(lst1) # lst1 = [1, 2, 2, 3, 4, 1, 2, 7]
for each in unq:
d1[each] = lst1.count(each)
else:
print(d1)
Upvotes: 3
Reputation: 4069
If you need to search for all element's positions between certain indices, you can state them:
[i for i,x in enumerate([1,2,3,2]) if x==2 & 2<= i <=3] # -> [3]
Upvotes: 3
Reputation: 8061
A solution using list.index
:
def indices(lst, element):
result = []
offset = -1
while True:
try:
offset = lst.index(element, offset+1)
except ValueError:
return result
result.append(offset)
It's much faster than the list comprehension with enumerate
, for large lists. It is also much slower than the numpy
solution if you already have the array, otherwise the cost of converting outweighs the speed gain (tested on integer lists with 100, 1000 and 10000 elements).
NOTE: A note of caution based on Chris_Rands' comment: this solution is faster than the list comprehension if the results are sufficiently sparse, but if the list has many instances of the element that is being searched (more than ~15% of the list, on a test with a list of 1000 integers), the list comprehension is faster.
Upvotes: 42
Reputation: 68682
While not a solution for lists directly, numpy
really shines for this sort of thing:
import numpy as np
values = np.array([1,2,3,1,2,4,5,6,3,2,1])
searchval = 3
ii = np.where(values == searchval)[0]
returns:
ii ==>array([2, 8])
This can be significantly faster for lists (arrays) with a large number of elements vs some of the other solutions.
Upvotes: 176
Reputation: 29103
One more solution(sorry if duplicates) for all occurrences:
values = [1,2,3,1,2,4,5,6,3,2,1]
map(lambda val: (val, [i for i in xrange(len(values)) if values[i] == val]), values)
Upvotes: 4
Reputation: 500347
How about:
In [1]: l=[1,2,3,4,3,2,5,6,7]
In [2]: [i for i,val in enumerate(l) if val==3]
Out[2]: [2, 4]
Upvotes: 27