Reputation: 2453
I've found many methods of finding list intersections here, but I'm having trouble finding an efficient way to find the intersection when order is taken into account.
list1 = [1, 2, 3, 4, 5, 6, 7]
list2 = [7, 6, 3, 4, 5, 8]
The function should return [3, 4, 5]
I would already know there is only one overlapping sequence, and I would know its minimum length, but not its exact length.
Upvotes: 1
Views: 1414
Reputation: 1125208
You are looking for the Longest Common Subsequence algorithm; the following uses dynamic programming to find the elements in O(NM) time (for sequences of length N and M):
def lcs(a, b):
tbl = [[0 for _ in range(len(b) + 1)] for _ in range(len(a) + 1)]
for i, x in enumerate(a):
for j, y in enumerate(b):
tbl[i + 1][j + 1] = tbl[i][j] + 1 if x == y else max(
tbl[i + 1][j], tbl[i][j + 1])
res = []
i, j = len(a), len(b)
while i and j:
if tbl[i][j] == tbl[i - 1][j]:
i -= 1
elif tbl[i][j] == tbl[i][j - 1]:
j -= 1
else:
res.append(a[i - 1])
i -= 1
j -= 1
return res[::-1]
Demo:
>>> def lcs(a, b):
... tbl = [[0 for _ in range(len(b) + 1)] for _ in range(len(a) + 1)]
... for i, x in enumerate(a):
... for j, y in enumerate(b):
... tbl[i + 1][j + 1] = tbl[i][j] + 1 if x == y else max(
... tbl[i + 1][j], tbl[i][j + 1])
... res = []
... i, j = len(a), len(b)
... while i and j:
... if tbl[i][j] == tbl[i - 1][j]:
... i -= 1
... elif tbl[i][j] == tbl[i][j - 1]:
... j -= 1
... else:
... res.append(a[i - 1])
... i -= 1
... j -= 1
... return res[::-1]
...
>>> list1 = [1, 2, 3, 4, 5, 6, 7]
>>> list2 = [7, 6, 3, 4, 5, 8]
>>> lcs(list1, list2)
[3, 4, 5]
This will find the subsequence regardless of location and if other elements are mixed in between:
>>> lcs([1, 2, 3, 4, 5, 6, 7], [7, 3, 6, 4, 8, 5])
[3, 4, 5]
Upvotes: 3