Reputation: 3276
Given 2 lists:
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
I want to find the "overlap":
c = [3,4,5,5,6]
I'd also like it if i could extract the "remainder" the part of a and b that's not in c.
a_remainder = [5,]
b_remainder = [1,4,7,]
Note: a has three 5's in it and b has two. b has two 4's in it and a has one.
The resultant list c should have two 5's (limited by list b) and one 4 (limited by list a).
This gives me what i want, but I can't help but think there's a much better way.
import copy
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
c = []
for elem in copy.deepcopy(a):
if elem in b:
a.pop(a.index(elem))
c.append(b.pop(b.index(elem)))
# now a and b both contain the "remainders" and c contains the "overlap"
On another note, what is a more accurate name for what I'm asking for than "overlap" and "remainder"?
Upvotes: 22
Views: 32502
Reputation: 21055
collection.Counter
available in Python 2.7 can be used to implement multisets that do exactly what you want.
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
a_multiset = collections.Counter(a)
b_multiset = collections.Counter(b)
overlap = list((a_multiset & b_multiset).elements())
a_remainder = list((a_multiset - b_multiset).elements())
b_remainder = list((b_multiset - a_multiset).elements())
print overlap, a_remainder, b_remainder
Upvotes: 24
Reputation: 28666
Try difflib.SequenceMatcher(), "a flexible class for comparing pairs of sequences of any type"...
A quick try:
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
sm = difflib.SequenceMatcher(None, a, b)
c = []
a_remainder = []
b_remainder = []
for tag, i1, i2, j1, j2 in sm.get_opcodes():
if tag == 'replace':
a_remainder.extend(a[i1:i2])
b_remainder.extend(b[j1:j2])
elif tag == 'delete':
a_remainder.extend(a[i1:i2])
elif tag == 'insert':
b_remainder.extend(b[j1:j2])
elif tag == 'equal':
c.extend(a[i1:i2])
And now...
>>> print c
[3, 4, 5, 5, 6]
>>> print a_remainder
[5]
>>> print b_remainder
[1, 4, 7]
Upvotes: 1
Reputation: 27087
OK, verbose, but kind of cool (similar in spirit to the collections.Counter
idea, but more home-made):
import itertools as it
flatten = it.chain.from_iterable
sorted(
v for u,v in
set(flatten(enumerate(g)
for k, g in it.groupby(a))).intersection(
set(flatten(enumerate(g)
for k, g in it.groupby(b))))
)
The basic idea is to make each of the lists into a new list which attaches a counter to each object, numbered to account for duplicates -- so that then you can then use set
operations on these tuples after all.
To be slightly less verbose:
aa = set(flatten(enumerate(g) for k, g in it.groupby(a)))
bb = set(flatten(enumerate(g) for k, g in it.groupby(b)))
# aa = set([(0, 3), (0, 4), (0, 5), (0, 6), (1, 5), (2, 5)])
# bb = set([(0, 1), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (1, 4), (1, 5)])
cc = aa.intersection(bb)
# cc = set([(0, 3), (0, 4), (0, 5), (0, 6), (1, 5)])
c = sorted(v for u,v in cc)
# c = [3, 4, 5, 5, 6]
groupby
-- produces a list of lists containing identical elements
(but because of the syntax needs the g for k,g in it.groupby(a)
to extract each list)enumerate
-- appends a counter to each element of each sublistflatten
-- create a single listset
-- convert to a setintersection
-- find the common elementssorted(v for u,v in cc)
-- get rid of the counters and sort the resultFinally, I'm not sure what you mean by the remainders; it seems like it ought to be my aa-cc
and bb-cc
but I don't know where you get a_remainder = [4]
:
sorted(v for u,v in aa-cc)
# [5]
sorted(v for u,v in bb-cc)
# [1, 4, 7]
Upvotes: 4
Reputation: 3276
A response from kerio in #python on freenode:
[ i for i in itertools.chain.from_iterable([k] * v for k, v in \
(Counter(a) & Counter(b)).iteritems())
]
Upvotes: 2
Reputation: 208475
I don't think you should actually use this solution, but I took this opportunity to practice with lambda functions and here is what I came up with :)
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
dedup = lambda x: [set(x)] if len(set(x)) == len(x) else [set(x)] + dedup([x[i] for i in range(1, len(x)) if x[i] == x[i-1]])
default_set = lambda x: (set() if x[0] is None else x[0], set() if x[1] is None else x[1])
deduped = map(default_set, map(None, dedup(a), dedup(b)))
get_result = lambda f: reduce(lambda x, y: list(x) + list(y), map(lambda x: f(x[0], x[1]), deduped))
c = get_result(lambda x, y: x.intersection(y)) # [3, 4, 5, 6, 5]
a_remainder = get_result(lambda x, y: x.difference(y)) # [5]
b_remainder = get_result(lambda x, y: y.difference(x)) # [1, 7, 4]
I'm pretty sure izip_longest would have simplified this a bit (wouldn't have needed the default_set
lambda), but I was testing this with Python 2.5.
Here are some of the intermediate values used in the calculation in case anyone wants to understand this:
dedup(a) = [set([3, 4, 5, 6]), set([5]), set([5])]
dedup(b) = [set([1, 3, 4, 5, 6, 7]), set([4, 5])]
deduped = [(set([3, 4, 5, 6]), set([1, 3, 4, 5, 6, 7])), (set([5]), set([4, 5])), (set([5]), set([]))]
Upvotes: -1
Reputation: 13289
In the language of sets, overlap is 'intersection' and remainder is 'set difference'. If you had distinct items, you wouldn't have to do these operations yourself, check out http://docs.python.org/library/sets.html if you're interested.
Since we're not working with distinct elements, your approach is reasonable. If you wanted this to run faster, you could create a dictionary for each list and map the number to how many elements are in each array (e.g., in a, 3->1, 4->1, 5->2, etc.). You would then iterate through map a, determine if that letter existed, decrement its count and add it to the new list
Untested code, but this is the idea
def add_or_update(map,value):
if value in map:
map[value]+=1
else
map[value]=1
b_dict = dict()
for b_elem in b:
add_or_update(b_dict,b_elem)
intersect = []; diff = [];
for a_elem in a:
if a_elem in b_dict and b_dict[a_elem]>0:
intersect.add(a_elem);
for k,v in diff:
for i in range(v):
diff.add(k);
Upvotes: 6
Reputation: 6552
Use python set
intersection = set(a) & set(b)
a_remainder = set(a) - set(b)
b_remainder = set(b) - set(a)
Upvotes: 20
Reputation: 14304
Aset = Set(a);
Bset = Set(b);
a_remainder = a.difference(b);
b_remainder = b.difference(a);
c = a.intersection(b);
But if you need c
to have duplicates, and order is important for you,
you may look for w:Longest common subsequence problem
Upvotes: 0