user3262037
user3262037

Reputation: 15

Efficiently find overlap of a lists of tuples

I have a bunch of lists, each composed of tuples.

A = [(1,2,3),(4,5,7),(8,9,10),(5,6,2)]
B = [(1,3,6),(4,2,8),(3,6,7),(5,2,8)]
C = [(6,2,3),(1,7,2),(5,7,2),(7,2,7)]

I need to find the group of tuples such that the first element of the tuple appears in every list. (I know this is very confusing) For my above example, the overlap would be:

overlap = [(1,2,3),(1,3,6),(1,7,2),(5,6,2),(5,2,8),(5,7,2)]

This is because a tuple with the number '1' in the first element of the tuple appears in every list. This is the same for the number '5'.

What is the best way to do this?

I currently have working code, but I feel like there is a better way to do this.

big_list = [A,B,C]
overlap = []
all_points = list(set([item for item in big_list]))
for (f,s,t) in all_points:
    in_all = True
    for lest in big_list:
        present = False
        for (first, second, third) in lest:
            if first == f:
                present = True
        if not present:
            in_all = False
    if in_all:
        overlap.append((f,s,t))

Upvotes: 0

Views: 1026

Answers (3)

Roger Jiang
Roger Jiang

Reputation: 66

Since you have a list of lists, I will do one loop with another level of list comprehension. A two level list comprehension will be too confusing to read.

first_letters = None
for l in [A, B, C]:
   f = set([ i[0] for i in l ])
   first_letters = first_letters & f if first_letters else f 
overlap = [ i for i in A + B + C if i[0] in first_letters ]

alternatively, if you prefer no explicit looping,

def func1(x, y):
    f = set([i[0] for i in y])
    return x & f if x else f
first_letters = reduce(func1, [A, B, C], None)
overlap = [ i for i in A + B + C if i[0] in first_letters ]

if you know all possible values, you can further simplify to following codes:

all_possible = set(range(9))
first_letters = reduce( lambda x, y: x & set([ i[0] for i in y ]), [A, B, C], all_possible )
overlap = [ i for i in A + B + C if i[0] in first_letters ]

Upvotes: 0

doubleo
doubleo

Reputation: 4759

As you only care about first elements, you should not have to run multiple loops. Use a set instead. Comments are provided inline.

A = [(1,2,3),(4,5,7),(8,9,10),(5,6,2)]
B = [(1,3,6),(4,2,8),(3,6,7),(5,2,8)] 
C = [(6,2,3),(1,7,2),(5,7,2),(7,2,7)]

def tuple_F(n_list):
    # You only care about unique first elements, so using a set will be more efficient.
    return set(nl[0] for n_l in n_list)

set_FA = tuple_F(A)
set_FB = tuple_F(B)
set_FC = tuple_F(C)

# Python makes it ridiculously easy to take intersection of multiple sets at one shot
set_ABC = set.intersection(set_FA, set_FB, set_FC)

# And again, python makes it really easy to merge sets using just a A+B+C
overlap = [tup for tup in A+B+C if tup[0] in set_ABC]

print overlap

This would print:

[(1, 2, 3), (5, 6, 2), (1, 3, 6), (5, 2, 8), (1, 7, 2), (5, 7, 2)]

Hope this helps!

Upvotes: 1

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251096

You can use set intersection for this:

>>> from itertools import chain
>>> def get_first(seq):                                       
        return (x[0] for x in seq)
>>> common = set(get_first(A)).intersection(get_first(B), get_first(C))

Now common contains:

>>> common
set([1, 5])

Now we can loop over individual items from A, B and C and choose those tuples whose first item is found in common:

>>> [x for x in chain(A, B, C) if x[0] in common]
[(1, 2, 3), (5, 6, 2), (1, 3, 6), (5, 2, 8), (1, 7, 2), (5, 7, 2)]

Sort by first item:

>>> from operator import itemgetter
>>> sorted((x for x in chain(A, B, C) if x[0] in common), key=itemgetter(0))
[(1, 2, 3), (1, 3, 6), (1, 7, 2), (5, 6, 2), (5, 2, 8), (5, 7, 2)]

Upvotes: 2

Related Questions