Chris Nielsen
Chris Nielsen

Reputation: 849

processing a set of unique tuples

I have a set of unique tuples that looks like the following. The first value is the name, the second value is the ID, and the third value is the type.

('9', '0000022', 'LRA')
('45', '0000016', 'PBM')
('16', '0000048', 'PBL')
('304', '0000042', 'PBL')
('7', '0000014', 'IBL')
('12', '0000051', 'LRA')
('7', '0000014', 'PBL')
('68', '0000002', 'PBM')
('356', '0000049', 'PBL')
('12', '0000051', 'PBL')
('15', '0000015', 'PBL')
('32', '0000046', 'PBL')
('9', '0000022', 'PBL')
('10', '0000007', 'PBM')
('7', '0000014', 'LRA')
('439', '0000005', 'PBL')
('4', '0000029', 'LRA')
('41', '0000064', 'PBL')
('10', '0000007', 'IBL')
('8', '0000006', 'PBL')
('331', '0000040', 'PBL')
('9', '0000022', 'IBL')

This set includes duplicates of the name/ID combination, but they each have a different type. For example:

('9', '0000022', 'LRA')
('9', '0000022', 'PBL')
('9', '0000022', 'IBL')

What I would like to do is process this set of tuples so that I can create a new list where each name/ID combination would only appear once, but include all types. This list should only include the name/ID combos that have more than one type. For example, my output would look like this:

('9', '0000022', 'LRA', 'PBL', 'IBL')
('7', '0000014', 'IBL', 'PBL', 'LRA')

but my output should not include name/ID combos that have only one type:

('45', '0000016', 'PBM')
('16', '0000048', 'PBL')

Any help is appreciated!

Upvotes: 1

Views: 136

Answers (3)

Garth5689
Garth5689

Reputation: 622

one-liner for science (the other answers are much more readable and probable more correct):

testlist=[('9', '0000022', 'LRA'),
('45', '0000016', 'PBM'),
('16', '0000048', 'PBL'),
('304', '0000042', 'PBL'),etc.]


from collections import Counter

new_list = [(a1,b1)+tuple([c for (a,b,c) in testlist if (a,b) == (a1,b1)]) \
      for (a1,b1) in [pair for pair,count in Counter([(a,b) \
      for (a,b,c) in testlist]).iteritems() if count > 1]]

print new_list

yields:

[('9', '0000022', 'LRA', 'PBL', 'IBL'),
 ('12', '0000051', 'LRA', 'PBL'), 
 ('10', '0000007', 'PBM', 'IBL'), 
 ('7', '0000014', 'IBL', 'PBL', 'LRA')]

Upvotes: 1

roippi
roippi

Reputation: 25974

Pretty straightforward to accumulate with a defaultdict and then filter:

from collections import defaultdict

d = defaultdict(list)
for tup in list_of_tuples:
    d[(tup[0],tup[1])].append(tup[2])

d
Out[15]: defaultdict(<class 'list'>, {('16', '0000048'): ['PBL'], ('9', '0000022'): ['LRA', 'PBL', 'IBL'], ('12', '0000051'): ['LRA', 'PBL'], ('304', '0000042'): ['PBL'], ('331', '0000040'): ['PBL'], ('41', '0000064'): ['PBL'], ('356', '0000049'): ['PBL'], ('15', '0000015'): ['PBL'], ('8', '0000006'): ['PBL'], ('4', '0000029'): ['LRA'], ('7', '0000014'): ['IBL', 'PBL', 'LRA'], ('32', '0000046'): ['PBL'], ('68', '0000002'): ['PBM'], ('439', '0000005'): ['PBL'], ('10', '0000007'): ['PBM', 'IBL'], ('45', '0000016'): ['PBM']})

And then filter:

[(key,val) for key,val in d.items() if len(val) > 1]
Out[29]: 
[(('9', '0000022'), ['LRA', 'PBL', 'IBL']),
 (('12', '0000051'), ['LRA', 'PBL']),
 (('7', '0000014'), ['IBL', 'PBL', 'LRA']),
 (('10', '0000007'), ['PBM', 'IBL'])]

And if you really want to get it back into that original format:

from itertools import chain

[tuple(chain.from_iterable(tup)) for tup in d.items() if len(tup[1]) > 1]
Out[27]: 
[('9', '0000022', 'LRA', 'PBL', 'IBL'),
 ('12', '0000051', 'LRA', 'PBL'),
 ('7', '0000014', 'IBL', 'PBL', 'LRA'),
 ('10', '0000007', 'PBM', 'IBL')]

Though I think it most sense to keep it as a dict with (name,id) tuples as the keys, as we generated in the first step.

Upvotes: 1

Erik Kaplun
Erik Kaplun

Reputation: 38257

itertools.groupby with some additional processing of what it outputs will do the job:

from itertools import groupby

data = {
    ('9', '0000022', 'LRA'),
    ('45', '0000016', 'PBM'),
    ('16', '0000048', 'PBL'),
    ...
}

def group_by_name_and_id(s):
    grouped = groupby(sorted(s), key=lambda (name, id_, type_): (name_, id))
    for (name, id_), items in grouped:
        types = tuple(type_ for _, _, type_ in items)
        if len(types) > 1:
            yield (name, id_) + types

print '\n'.join(str(x) for x in group_by_name_and_id(data))

outputs:

('10', '0000007', 'PBM', 'IBL')
('12', '0000051', 'LRA', 'PBL')
('7', '0000014', 'LRA', 'PBL', 'IBL')
('9', '0000022', 'LRA', 'PBL', 'IBL')

P.S. but I don't really like that design: thet types could/should really be a list contained in the 3rd item of the tuple, not part of the tuple itself... because this way the tuple is dynamic in length, and that's ugly... tuples aren't meant to be used like that. So best to replace

        types = tuple(type_ for _, _, type_ in items)
        yield (name, id_) + types

with

        types = [type_ for _, _, type_ in items]
        yield (name, id_, types)

yielding the much cleaner looking

('10', '0000007', ['IBL', 'PBM'])
('12', '0000051', ['LRA', 'PBL'])
('7', '0000014', ['IBL', 'LRA', 'PBL'])
('9', '0000022', ['IBL', 'LRA', 'PBL'])

for example then you can just iterate over the resulting data with for name, id, types in transformed_data:.

Upvotes: 3

Related Questions