Nico Schlömer
Nico Schlömer

Reputation: 58721

find, collect duplicates in list of lists/sets

In Python, I have a list of tuples and a list of integers with the same length, e.g.,

a = [
    [1, 2],
    [3, 2],
    [4, 66],
    [2, 3]
    ]

b = [
    1,
    31,
    31,
    44
    ]

The k-th entry in a can thought of as being associated with the k-th entry in b.

The entries [3, 2] and [2, 3] are really the same for me, and I'd like a uniquified with that in mind. Also, I would like a list of entries of belonging to the new unique list. For the above example,

a2 = [
    [1, 2],
    [3, 2],  # or [2, 3]
    [4, 66]
    ]

b2 = [
    [1],
    [31, 44],
    [31]
    ]

b2[0] is [1] since [1, 2] is associated with only 1. b2[1] is [31, 44] since [2, 3] (which equals [3, 2] is associated with 31 and 44 in a.

It's possible to go through a entry by entry, make each 2-list a frozenset, sort it into a dictionary etc. Needless to say, this doesn't perform very well if a and b are large.

Any hints on how to handle this smarter? (List comprehensions?)

Upvotes: 1

Views: 55

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

if you want to maintain order and group I don't think you won't get much better than grouping with an OrderedDict:

from collections  import OrderedDict
a = [
    [1, 2],
    [3, 2],
    [4, 66],
    [2, 3]
    ]

b = [1, 31, 31, 44]
d = OrderedDict()
for ind, f in enumerate(map(frozenset, a)):
        d.setdefault(f, []).append(b[ind])

print(list(d), list(d.values()))

Which would give you:

[frozenset({1, 2}), frozenset({2, 3}), frozenset({66, 4})] [[1], [31, 44], [31]]

if order seen is irrelevant, use a defaultdict:

from collections  import defaultdict
a = [
    [1, 2],
    [3, 2],
    [4, 66],
    [2, 3]
    ]

b = [1, 31, 31, 44]
d = defaultdict(list)
for ind, f in enumerate(map(frozenset, a)):
        d[f].append(b[ind])

print(list(d), list(d.values()))

Which would give you:

 [frozenset({1, 2}), frozenset({2, 3}), frozenset({66, 4})] [[1], [31, 44], [31]]

If you really want lists or tuples:

print(list(map(list, d)), list(d.values()))

Which would give you:

[[1, 2], [2, 3], [66, 4]] [[1], [31, 44], [31]]

For python2, you should use itertools.izip and itertools.imap in place of map and zip.

Upvotes: 5

Alec
Alec

Reputation: 1469

For a:

a = [
    [1, 2],
    [3, 2],
    [4, 66],
    [2, 3]
    ]

a_set = {frozenset(i) for i in a}
a2 = list(list(i) for i in a_set)
print(a2)
# -> [[66, 4], [1, 2], [2, 3]]

Not sure what you're looking for with b.

Edit: That makes more sense. @PadraicCunningham's answer is spot-on.

Upvotes: 1

Related Questions