Reputation: 29

Efficiently find strings in list of lists of strings (Python)

I'm looking for an efficient way to find different strings in a list of string lists and return their indices. Here is the code:

inp = [ 'ans1', 'ans2', 'ans3' ]
output = [ [ 'aaa', 'ans1', 'bbb', 'ccc', 'ans2', 'ddd' ],
           [ 'bbb', 'aaa', 'ans2', 'ddd', 'ans1', 'aaa' ],
           [ 'ddd', 'ccc', 'ans2', 'ans1', 'aaa', 'bbb' ] ]

# expected result
# result = [ [ 1, 4, 3 ], [ 4, 2, 2 ], [ -1, -1, -1 ] ]

Those reported in the result are the indices for the position in the output list of each string in the inp list. For example, ans2 is at index 4 in the first sublist, index 2 in the second sublist, and index 2 in the third sublist. Similarly for ans1. ans3, however, does not appear in any sublist and, therefore, the returned index is -1.

What I'm looking for is an efficient way to do this computation (possibly in parallel?) while avoiding the classic for loops that this can clearly be done with.

Some considerations:

output has shape equal to [ len( inp ), L ], where L is the size of the dictionary. In this case L = 5.

Upvotes: 1

Answers (2)

alexnik42

Reputation: 188

You can try list comprehension:

result = [[o.index(s) if s in o else -1 for o in output] for s in inp]
print(result) # [[1, 4, 3], [4, 2, 2], [-1, -1, -1]]

Update:

Also it's probably not the best idea to store -1 as an index for strings, which are not presented in the output list. -1 is a valid index in Python, which may potentially lead to errors in the future if you plan to do something with indexes, stored in the result.

Upvotes: 1

Andrej Kesely

Reputation: 195613

You can create dictionary index first to speed-up the search:

inp = ["ans1", "ans2", "ans3"]
output = [
    ["aaa", "ans1", "bbb", "ccc", "ans2", "ddd"],
    ["bbb", "aaa", "ans2", "ddd", "ans1", "aaa"],
    ["ddd", "ccc", "ans2", "ans1", "aaa", "bbb"],
]

tmp = [{v: i for i, v in enumerate(subl)} for subl in output]

result = [[d.get(i, -1) for d in tmp] for i in inp]
print(result)

Prints:

[[1, 4, 3], [4, 2, 2], [-1, -1, -1]]

Upvotes: 0

Efficiently find strings in list of lists of strings (Python)

Answers (2)

Related Questions