nomistic
nomistic

Reputation: 2962

python intersection of lists while not having the same index

I have a curious case, and after some time I have not come up with an adequate solution.

Say you have two lists and you need to find items that have the same index.

x = [1,4,5,7,8]
y = [1,3,8,7,9]

I am able to get a correct intersection of those which appear in both lists with the same index by using the following:

matches = [i for i, (a,b) in enumerate(zip(x,y)) if a==b)

This would return:

[0,3]

I am able to get a a simple intersection of both lists with the following (and in many other ways, this is just an example)

intersected = set(x) & set(y)

This would return this list:

[1,8,7,9]

Here's the question. I'm wondering for some ideas for a way of getting a list of items (as in the second list) which do not include those matches above but are not in the same position on the list.

In other words, I'm looking items in x that do not share the same index in the y

The desired result would be the index position of "8" in y, or [2]

Thanks in advance

Upvotes: 1

Views: 193

Answers (3)

Stefan Gruenwald
Stefan Gruenwald

Reputation: 2640

x = [1,4,5,7,8]
y = [1,3,8,7,9]

result=[]
for e in x:
    if e in y and x.index(e) != y.index(e):
        result.append((x.index(e),y.index(e),e))

print result  #gives tuple with x_position,y_position,value

This version goes item by item through the first list and checks whether the item is also in the second list. If it is, it compares the indices for the found item in both lists and if they are different then it stores both indices and the item value as a tuple with three values in the result list.

Upvotes: -1

Martijn Pieters
Martijn Pieters

Reputation: 1121992

I'd create a dictionary of indices for the first list, then use that to test if the second value is a) in that dictionary, and b) the current index is not present:

def non_matching_indices(x, y):
    x_indices = {}
    for i, v in enumerate(x):
        x_indices.setdefault(v, set()).add(i)

    return [i for i, v in enumerate(y) if i not in x_indices.get(v, {i})]

The above takes O(len(x) + len(y)) time; a single full scan through the one list, then another full scan through the other, where each test to include i is done in constant time.

You really don't want to use a value in x containment test here, because that requires a scan (a loop) over x to see if that value is really in the list or not. That takes O(len(x)) time, and you do that for each value in y, which means that the fucntion takes O(len(x) * len(y)) time.

You can see the speed differences when you run a time trial with a larger list filled with random data:

>>> import random, timeit
>>> def using_in_x(x, y):
...     return [i for i, a in enumerate(y) if a in x and a != x[i]]
...
>>> x = random.sample(range(10**6), 1000)
>>> y = random.sample(range(10**6), 1000)
>>> for f in (using_in_x, non_matching_indices):
...     timer = timeit.Timer("f(x, y)", f"from __main__ import f, x, y")
...     count, total = timer.autorange()
...     print(f"{f.__name__:>20}: {total / count * 1000:6.3f}ms")
...
          using_in_x: 10.468ms
non_matching_indices:  0.630ms

So with two lists of 1000 numbers each, if you use value in x testing, you easily take 15 times as much time to complete the task.

Upvotes: 2

Prune
Prune

Reputation: 77847

You're so close: iterate through y; look for a value that is in x, but not at the same position:

offset = [i for i, a in enumerate(y) if a in x and a != x[i] ]

Result:

[2]

Including the suggested upgrade from pault, with respect to Martijn's comment ... the pre-processing reduces the complexity, in case of large lists:

>>> both = set(x) & set(y)
>>> offset = [i for i, a in enumerate(y) if a in both and a != x[i] ]

As PaulT pointed out, this is still quite readable at OP's posted level.

Upvotes: 3

Related Questions