marduk
marduk

Reputation: 61

Contrasting two sorted lists in Python

I have two sorted lists and I need to find the odd ones out in each list.

Currently I am using two list comprehensions with not in:

> if foo != bar:
      in_foo = [i for i in foo if i not in bar]
      in_bar = [i for i in bar if i not in foo]

However, this method doesn't take advantage of the sorted structure of the list.

I can use an alternative method with a counter variable and recursion, but is there a more pythonic way of doing this?

edit: sorted output is preferred. Thanks.

Upvotes: 1

Views: 81

Answers (3)

hilberts_drinking_problem
hilberts_drinking_problem

Reputation: 11602

Here is a comparison of three methods. The with_intersection method allows for repeated values within each list, the other two do not. The test considers two sorted lists, each with one million distinct integers.

The using_sorted method takes advantage of the fact that both lists are sorted and does not use sets. At the same time, it is the slowest, most verbose and error-prone.

import numpy as np # only for data generation

lst1 = np.random.randint(1, 20, 10**6).cumsum().tolist()
lst2 = np.random.randint(1, 20, 10**6).cumsum().tolist()

def with_intersection(lst1, lst2):
  common = set(lst1).intersection(lst2)
  res1 = [x for x in lst1 if x not in common]
  res2 = [x for x in lst2 if x not in common]
  return res1, res2

def set_then_sort(foo, bar):
  _foo   = set(foo)
  _bar   = set(bar)
  in_foo = _foo - _bar
  in_bar = _bar - _foo
  return sorted(in_foo), sorted(in_bar)

def using_sorted(lst1, lst2):
  res1 = list()
  res2 = list()
  n1 = len(lst1)
  n2 = len(lst2)
  i = j = 0
  while True:
    while i < n1 and j < n2 and lst1[i] < lst2[j]: 
      res1.append(lst1[i])
      i += 1
    while j < n2 and i < n1 and lst1[i] > lst2[j]:
      res2.append(lst2[j])
      j += 1
    while i < n1 and j < n2 and lst1[i] == lst2[j]:
      i += 1
      j += 1
    if i == n1:
      res2.extend(lst2[j:])
      break
    elif j == n2:
      res1.extend(lst1[i:])
      break
  return res1, res2
      
assert with_intersection(lst1, lst2) == set_then_sort(lst1, lst2) == using_sorted(lst1, lst2)

# %timeit with_intersection(lst1, lst2) # 306 ms
# %timeit set_then_sort(lst1, lst2)     # 491 ms
# %timeit using_sorted(lst1, lst2)      # 870 ms 

Upvotes: 1

VirtualScooter
VirtualScooter

Reputation: 1888

By putting a None trailer at the end of each list, we can use iterators and focus on recording the differences. This algorithm is O(n+m):

foo = [1, 3, 4, 5]
bar = [2, 4, 5, 6]

i_foo = iter(foo + [None])
i_bar = iter(bar + [None])

n_foo = next(i_foo)
n_bar = next(i_bar)
in_foo = []
in_bar = []
while True:
    if n_foo is None:
        if n_bar is None:
            break
        in_bar.append(n_bar)
        n_bar = next(i_bar)
        continue
    if n_bar is None:
        in_foo.append(n_foo)
        n_foo = next(i_foo)
    if n_foo == n_bar:
        n_foo = next(i_foo)
        n_bar = next(i_bar)
        continue
    if n_foo < n_bar:
        in_foo.append(n_foo)
        n_foo = next(i_foo)
    else:
        in_bar.append(n_bar)
        n_bar = next(i_bar)
print(in_foo, in_bar)
# [1, 3] [2, 6]

Upvotes: 0

ti7
ti7

Reputation: 18796

Any time you have something like this, it's frequently better to use a set and ignore the sorting (which matters significantly less in Python for small lists than for other programming languages due to the language overhead)

_foo   = set(foo)
_bar   = set(bar)
in_foo = _foo - _bar
in_bar = _bar - _foo

Upvotes: 6

Related Questions