Optimizing a nested for loop with two lists

Question

I have a program that searches through two separate lists, lets call them list1 and list2.

I only want to print the instances where list1 and list2 have matching items. The thing is, not all items in both lists match eachother, but the first, third and fourth items should.

If they match, I want the complete lists (including the mismatching items) to be appended to two corresponding lists.

I have written the follow code:

for item in list1:
    for item2 in list2:
        if (item[0] and item[2:4])==(item[0] and item2[2:4]):
            newlist1.append(item)
            newlist2.append(item2)
            break

This works, but it's quite inefficient. For some of the larger files I'm looking through it can take more than 10 seconds to complete the match, and it should ideally be at most half of that.

What I'm thinking is that it shouldn't have to start over from the beginning in list2 each time the code is run, it should be enough to continue from the last point where there was a match. But I don't know how to write it in code.

tobias_k · Accepted Answer

Your condition (item[0] and item[2:4])==(item[0] and item2[2:4]) is wrong.

Besides that the second item[0] should probably be item2[0], what (item[0] and item[2:4]) does is the following (analogously for (item2[0] and item2[2:4])):

if item[0] is 0, it returns item[0] itself, i.e. 0
if item[0] is not 0, it returns whatever item[2:4] is

And this is then compared to the result of the second term. Thus, [0,1,1,1] would "equal" [0,2,2,2], and [1,1,1,1] would "equal" [2,1,1,1].

Try using tuples instead:

if (item[0], item[2:4]) == (item2[0], item2[2:4]):

Or use operator.itemgetter as suggested in the other answer.

To speed up the pairwise matching of items from both lists, put the items from the first list into a dictionary, using those tuples as key, and then iterating over the other list and looking up the matching items in the dictionary. Complexity will be O(n+m) instead of O(n*m) (n and m being the length of the lists).

key = operator.itemgetter(0, 2, 3)

list1_dict = {}
for item in list1:
    list1_dict.setdefault(key(item), []).append(item)

for item2 in list2:
    for item in list1_dict.get(key(item2), []):
        newlist1.append(item)
        newlist2.append(item2)

Optimizing a nested for loop with two lists

Answers (2)

Related Questions