Reputation: 61
I have two sorted lists and I need to find the odd ones out in each list.
Currently I am using two list comprehensions with not in
:
> if foo != bar:
in_foo = [i for i in foo if i not in bar]
in_bar = [i for i in bar if i not in foo]
However, this method doesn't take advantage of the sorted structure of the list.
I can use an alternative method with a counter variable and recursion, but is there a more pythonic way of doing this?
edit: sorted output is preferred. Thanks.
Upvotes: 1
Views: 81
Reputation: 11602
Here is a comparison of three methods. The with_intersection
method allows for repeated values within each list, the other two do not. The test considers two sorted lists, each with one million distinct integers.
The using_sorted
method takes advantage of the fact that both lists are sorted and does not use sets. At the same time, it is the slowest, most verbose and error-prone.
import numpy as np # only for data generation
lst1 = np.random.randint(1, 20, 10**6).cumsum().tolist()
lst2 = np.random.randint(1, 20, 10**6).cumsum().tolist()
def with_intersection(lst1, lst2):
common = set(lst1).intersection(lst2)
res1 = [x for x in lst1 if x not in common]
res2 = [x for x in lst2 if x not in common]
return res1, res2
def set_then_sort(foo, bar):
_foo = set(foo)
_bar = set(bar)
in_foo = _foo - _bar
in_bar = _bar - _foo
return sorted(in_foo), sorted(in_bar)
def using_sorted(lst1, lst2):
res1 = list()
res2 = list()
n1 = len(lst1)
n2 = len(lst2)
i = j = 0
while True:
while i < n1 and j < n2 and lst1[i] < lst2[j]:
res1.append(lst1[i])
i += 1
while j < n2 and i < n1 and lst1[i] > lst2[j]:
res2.append(lst2[j])
j += 1
while i < n1 and j < n2 and lst1[i] == lst2[j]:
i += 1
j += 1
if i == n1:
res2.extend(lst2[j:])
break
elif j == n2:
res1.extend(lst1[i:])
break
return res1, res2
assert with_intersection(lst1, lst2) == set_then_sort(lst1, lst2) == using_sorted(lst1, lst2)
# %timeit with_intersection(lst1, lst2) # 306 ms
# %timeit set_then_sort(lst1, lst2) # 491 ms
# %timeit using_sorted(lst1, lst2) # 870 ms
Upvotes: 1
Reputation: 1888
By putting a None trailer at the end of each list, we can use iterators and focus on recording the differences. This algorithm is O(n+m):
foo = [1, 3, 4, 5]
bar = [2, 4, 5, 6]
i_foo = iter(foo + [None])
i_bar = iter(bar + [None])
n_foo = next(i_foo)
n_bar = next(i_bar)
in_foo = []
in_bar = []
while True:
if n_foo is None:
if n_bar is None:
break
in_bar.append(n_bar)
n_bar = next(i_bar)
continue
if n_bar is None:
in_foo.append(n_foo)
n_foo = next(i_foo)
if n_foo == n_bar:
n_foo = next(i_foo)
n_bar = next(i_bar)
continue
if n_foo < n_bar:
in_foo.append(n_foo)
n_foo = next(i_foo)
else:
in_bar.append(n_bar)
n_bar = next(i_bar)
print(in_foo, in_bar)
# [1, 3] [2, 6]
Upvotes: 0
Reputation: 18796
Any time you have something like this, it's frequently better to use a set and ignore the sorting (which matters significantly less in Python for small lists than for other programming languages due to the language overhead)
_foo = set(foo)
_bar = set(bar)
in_foo = _foo - _bar
in_bar = _bar - _foo
Upvotes: 6