Reputation: 2205
Using Python how do you reduce a list of lists by an ordered subset match [[..],[..],..]
?
In the context of this question a list L is a subset of list M
if M
contains all members of L
, and in the same order. For example, the list [1,2] is a subset of the list [1,2,3], but not of the list [2,1,3].
Example input:
a. [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
b. [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
Expected result:
a. [[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
b. [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
Further Examples:
L = [[1, 2, 3, 4, 5, 6, 7], [1, 2, 5, 6]]
- No reduce
L = [[1, 2, 3, 4, 5, 6, 7],
, [1, 2, 3]
[1, 2, 4, 8]]
- Yes reduce
L = [[1, 2, 3, 4, 5, 6, 7], [7, 6, 5, 4, 3, 2, 1]]
- No reduce
(Sorry for causing confusion with the incorrect data set.)
Upvotes: 14
Views: 20037
Reputation: 49013
So what you really wanted was to know if a list was a substring, so to speak, of another, with all the matching elements consecutive. Here is code that converts the candidate and the target list to comma-separated strings and does a substring comparison to see if the candidate appears within the target list
def is_sublist_of_any_list(cand, lists):
def comma_list(l):
return "," + ",".join(str(x) for x in l) + ","
cand = comma_list(cand)
return any(cand in comma_list(target) for target in lists if len(cand) <= len(target))
def super_lists(lists):
return [cand for i, cand in enumerate(lists) if not is_sublist_of_any_list(cand, lists[:i] + lists[i+1:])]
The function comma_list() puts leading and trailing commas on the list to ensure that integers are fully delimited. Otherwise, [1] would be a subset of [100], for example.
Upvotes: 0
Reputation: 2205
Thanks to all who suggested solutions and coping with my sometimes erroneous data sets. Using @hughdbrown solution I modified it to what I wanted:
The modification was to use a sliding window over the target to ensure the subset sequence was found. I think I should have used a more appropriate word than 'Set' to describe my problem.
def is_sublist_of_any_list(cand, lists):
# Compare candidate to a single list
def is_sublist_of_list(cand, target):
try:
i = 0
try:
start = target.index(cand[0])
except:
return False
while start < (len(target) + len(cand)) - start:
if cand == target[start:len(cand)]:
return True
else:
start = target.index(cand[0], start + 1)
except ValueError:
return False
# See if candidate matches any other list
return any(is_sublist_of_list(cand, target) for target in lists if len(cand) <= len(target))
# Compare candidates to all other lists
def super_lists(lists):
a = [cand for i, cand in enumerate(lists) if not is_sublist_of_any_list(cand, lists[:i] + lists[i+1:])]
return a
lists = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
expect = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
def test():
out = super_lists(list(lists))
print "In : ", lists
print "Out : ", out
assert (out == expect)
Result:
In : [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
Out : [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
Upvotes: 0
Reputation: 49013
A list is a superlist if it is not a subset of any other list. It's a subset of another list if every element of the list can be found, in order, in another list.
Here's my code:
def is_sublist_of_any_list(cand, lists):
# Compare candidate to a single list
def is_sublist_of_list(cand, target):
try:
i = 0
for c in cand:
i = 1 + target.index(c, i)
return True
except ValueError:
return False
# See if candidate matches any other list
return any(is_sublist_of_list(cand, target) for target in lists if len(cand) <= len(target))
# Compare candidates to all other lists
def super_lists(lists):
return [cand for i, cand in enumerate(lists) if not is_sublist_of_any_list(cand, lists[:i] + lists[i+1:])]
if __name__ == '__main__':
lists = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
superlists = super_lists(lists)
print superlists
Here are the results:
[[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
Edit: Results for your later data set.
>>> lists = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17,
18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2,
3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
>>> superlists = super_lists(lists)
>>> expected = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [5
0, 69], [2, 3, 21], [1, 2, 4, 8]]
>>> assert(superlists == expected)
>>> print superlists
[[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3,
21], [1, 2, 4, 8]]
Upvotes: 1
Reputation: 39354
Refined answer after new test case:
original= [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
class SetAndList:
def __init__(self,aList):
self.list=aList
self.set=set(aList)
self.isUnique=True
def compare(self,other):
if self.set.issubset(other.set):
#print self.list,'superceded by',other.list
self.isUnique=False
def listReduce(lists):
temp=[]
for l in lists:
s=SetAndList(l)
for t in temp:
t.compare(s)
s.compare(t)
temp.append( s )
temp=[t for t in temp if t.isUnique]
return [t.list for t in temp if t.isUnique]
print listReduce(original)
You didn't give the required output, but I'm guessing this is right, as [1,2,3]
does not appear in the output.
Upvotes: 0
Reputation: 54882
Edit: I really need to improve my reading comprehension. Here's the answer to what was actually asked. It exploits the fact that "A is super of B
" implies "len(A) > len(B) or A == B
".
def advance_to(it, value):
"""Advances an iterator until it matches the given value. Returns False
if not found."""
for item in it:
if item == value:
return True
return False
def has_supersequence(seq, super_sequences):
"""Checks if the given sequence has a supersequence in the list of
supersequences."""
candidates = map(iter, super_sequences)
for next_item in seq:
candidates = [seq for seq in candidates if advance_to(seq, next_item)]
return len(candidates) > 0
def find_supersequences(sequences):
"""Finds the supersequences in the given list of sequences.
Sequence A is a supersequence of sequence B if B can be created by removing
items from A."""
super_seqs = []
for candidate in sorted(sequences, key=len, reverse=True):
if not has_supersequence(candidate, super_seqs):
super_seqs.append(candidate)
return super_seqs
print(find_supersequences([[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3],
[2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]))
#Output: [[1, 2, 3, 4, 5, 6, 7], [1, 2, 4, 8], [2, 3, 21]]
If you need to also preserve the original order of the sequences, then the find_supersequences()
function needs to keep track of the positions of the sequences and sort the output afterwards.
Upvotes: 0
Reputation: 13085
list0=[[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
for list1 in list0[:]:
for list2 in list0:
if list2!=list1:
len1=len(list1)
c=0
for n in list2:
if n==list1[c]:
c+=1
if c==len1:
list0.remove(list1)
break
This filters list0 in place using a copy of it. This is good if the result is expected to be about the same size as the original, there is only a few "subset" to remove.
If the result is expected to be small and the original is large, you might prefer this one who is more memory freindly as it doesn't copy the original list.
list0=[[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
result=[]
for list1 in list0:
subset=False
for list2 in list0:
if list2!=list1:
len1=len(list1)
c=0
for n in list2:
if n==list1[c]:
c+=1
if c==len1:
subset=True
break
if subset:
break
if not subset:
result.append(list1)
Upvotes: 0
Reputation: 211982
This code should be rather memory efficient. Beyond storing your initial list of lists, this code uses negligible extra memory (no temporary sets or copies of lists are created).
def is_subset(needle,haystack):
""" Check if needle is ordered subset of haystack in O(n) """
if len(haystack) < len(needle): return False
index = 0
for element in needle:
try:
index = haystack.index(element, index) + 1
except ValueError:
return False
else:
return True
def filter_subsets(lists):
""" Given list of lists, return new list of lists without subsets """
for needle in lists:
if not any(is_subset(needle, haystack) for haystack in lists
if needle is not haystack):
yield needle
my_lists = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3],
[2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
print list(filter_subsets(my_lists))
>>> [[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
And, just for fun, a one-liner:
def filter_list(L):
return [x for x in L if not any(set(x)<=set(y) for y in L if x is not y)]
Upvotes: 7
Reputation: 5819
This could be simplified, but:
l = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
l2 = l[:]
for m in l:
for n in l:
if set(m).issubset(set(n)) and m != n:
l2.remove(m)
break
print l2
[[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
Upvotes: 11
Reputation: 91299
I implemented a different issubseq
because yours doesn't say that [1, 2, 4, 5, 6]
is a subsequence of [1, 2, 3, 4, 5, 6, 7]
, for example (besides being painfully slow). The solution I came up with looks like this:
def is_subseq(a, b):
if len(a) > len(b): return False
start = 0
for el in a:
while start < len(b):
if el == b[start]:
break
start = start + 1
else:
return False
return True
def filter_partial_matches(sets):
return [s for s in sets if all([not(is_subseq(s, ss)) for ss in sets if s != ss])]
A simple test case, given your inputs and outputs:
>>> test = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
>>> another_test = [[1, 2, 3, 4], [2, 4, 3], [3, 4, 5]]
>>> filter_partial_matches(test)
[[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
>>> filter_partial_matches(another_test)
[[1, 2, 3, 4], [2, 4, 3], [3, 4, 5]]
Hope it helps!
Upvotes: 0
Reputation: 39354
This seems to work:
original=[[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
target=[[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
class SetAndList:
def __init__(self,aList):
self.list=aList
self.set=set(aList)
self.isUnique=True
def compare(self,aList):
s=set(aList)
if self.set.issubset(s):
#print self.list,'superceded by',aList
self.isUnique=False
def listReduce(lists):
temp=[]
for l in lists:
for t in temp:
t.compare(l)
temp.append( SetAndList(l) )
return [t.list for t in temp if t.isUnique]
print listReduce(original)
print target
This prints the calculated list and the target for visual comparison.
Uncomment the print line in the compare method to see how various lists get superceded.
Tested with python 2.6.2
Upvotes: 0