Reputation: 29
I'm looking for an efficient way to find different strings in a list of string lists and return their indices. Here is the code:
inp = [ 'ans1', 'ans2', 'ans3' ]
output = [ [ 'aaa', 'ans1', 'bbb', 'ccc', 'ans2', 'ddd' ],
[ 'bbb', 'aaa', 'ans2', 'ddd', 'ans1', 'aaa' ],
[ 'ddd', 'ccc', 'ans2', 'ans1', 'aaa', 'bbb' ] ]
# expected result
# result = [ [ 1, 4, 3 ], [ 4, 2, 2 ], [ -1, -1, -1 ] ]
Those reported in the result are the indices for the position in the output
list of each string in the inp
list. For example, ans2
is at index 4 in the first sublist, index 2 in the second sublist, and index 2 in the third sublist. Similarly for ans1
. ans3
, however, does not appear in any sublist and, therefore, the returned index is -1
.
What I'm looking for is an efficient way to do this computation (possibly in parallel?) while avoiding the classic for loops that this can clearly be done with.
Some considerations:
output
has shape equal to [ len( inp ), L ]
, where L
is the size of the dictionary. In this case L = 5
.Upvotes: 1
Views: 177
Reputation: 188
You can try list comprehension:
result = [[o.index(s) if s in o else -1 for o in output] for s in inp]
print(result) # [[1, 4, 3], [4, 2, 2], [-1, -1, -1]]
Update:
Also it's probably not the best idea to store -1 as an index for strings, which are not presented in the output list. -1 is a valid index in Python, which may potentially lead to errors in the future if you plan to do something with indexes, stored in the result.
Upvotes: 1
Reputation: 195613
You can create dictionary index first to speed-up the search:
inp = ["ans1", "ans2", "ans3"]
output = [
["aaa", "ans1", "bbb", "ccc", "ans2", "ddd"],
["bbb", "aaa", "ans2", "ddd", "ans1", "aaa"],
["ddd", "ccc", "ans2", "ans1", "aaa", "bbb"],
]
tmp = [{v: i for i, v in enumerate(subl)} for subl in output]
result = [[d.get(i, -1) for d in tmp] for i in inp]
print(result)
Prints:
[[1, 4, 3], [4, 2, 2], [-1, -1, -1]]
Upvotes: 0