Reputation: 849
I have a set of unique tuples that looks like the following. The first value is the name, the second value is the ID, and the third value is the type.
('9', '0000022', 'LRA')
('45', '0000016', 'PBM')
('16', '0000048', 'PBL')
('304', '0000042', 'PBL')
('7', '0000014', 'IBL')
('12', '0000051', 'LRA')
('7', '0000014', 'PBL')
('68', '0000002', 'PBM')
('356', '0000049', 'PBL')
('12', '0000051', 'PBL')
('15', '0000015', 'PBL')
('32', '0000046', 'PBL')
('9', '0000022', 'PBL')
('10', '0000007', 'PBM')
('7', '0000014', 'LRA')
('439', '0000005', 'PBL')
('4', '0000029', 'LRA')
('41', '0000064', 'PBL')
('10', '0000007', 'IBL')
('8', '0000006', 'PBL')
('331', '0000040', 'PBL')
('9', '0000022', 'IBL')
This set includes duplicates of the name/ID combination, but they each have a different type. For example:
('9', '0000022', 'LRA')
('9', '0000022', 'PBL')
('9', '0000022', 'IBL')
What I would like to do is process this set of tuples so that I can create a new list where each name/ID combination would only appear once, but include all types. This list should only include the name/ID combos that have more than one type. For example, my output would look like this:
('9', '0000022', 'LRA', 'PBL', 'IBL')
('7', '0000014', 'IBL', 'PBL', 'LRA')
but my output should not include name/ID combos that have only one type:
('45', '0000016', 'PBM')
('16', '0000048', 'PBL')
Any help is appreciated!
Upvotes: 1
Views: 136
Reputation: 622
one-liner for science (the other answers are much more readable and probable more correct):
testlist=[('9', '0000022', 'LRA'),
('45', '0000016', 'PBM'),
('16', '0000048', 'PBL'),
('304', '0000042', 'PBL'),etc.]
from collections import Counter
new_list = [(a1,b1)+tuple([c for (a,b,c) in testlist if (a,b) == (a1,b1)]) \
for (a1,b1) in [pair for pair,count in Counter([(a,b) \
for (a,b,c) in testlist]).iteritems() if count > 1]]
print new_list
yields:
[('9', '0000022', 'LRA', 'PBL', 'IBL'),
('12', '0000051', 'LRA', 'PBL'),
('10', '0000007', 'PBM', 'IBL'),
('7', '0000014', 'IBL', 'PBL', 'LRA')]
Upvotes: 1
Reputation: 25974
Pretty straightforward to accumulate with a defaultdict
and then filter:
from collections import defaultdict
d = defaultdict(list)
for tup in list_of_tuples:
d[(tup[0],tup[1])].append(tup[2])
d
Out[15]: defaultdict(<class 'list'>, {('16', '0000048'): ['PBL'], ('9', '0000022'): ['LRA', 'PBL', 'IBL'], ('12', '0000051'): ['LRA', 'PBL'], ('304', '0000042'): ['PBL'], ('331', '0000040'): ['PBL'], ('41', '0000064'): ['PBL'], ('356', '0000049'): ['PBL'], ('15', '0000015'): ['PBL'], ('8', '0000006'): ['PBL'], ('4', '0000029'): ['LRA'], ('7', '0000014'): ['IBL', 'PBL', 'LRA'], ('32', '0000046'): ['PBL'], ('68', '0000002'): ['PBM'], ('439', '0000005'): ['PBL'], ('10', '0000007'): ['PBM', 'IBL'], ('45', '0000016'): ['PBM']})
And then filter:
[(key,val) for key,val in d.items() if len(val) > 1]
Out[29]:
[(('9', '0000022'), ['LRA', 'PBL', 'IBL']),
(('12', '0000051'), ['LRA', 'PBL']),
(('7', '0000014'), ['IBL', 'PBL', 'LRA']),
(('10', '0000007'), ['PBM', 'IBL'])]
And if you really want to get it back into that original format:
from itertools import chain
[tuple(chain.from_iterable(tup)) for tup in d.items() if len(tup[1]) > 1]
Out[27]:
[('9', '0000022', 'LRA', 'PBL', 'IBL'),
('12', '0000051', 'LRA', 'PBL'),
('7', '0000014', 'IBL', 'PBL', 'LRA'),
('10', '0000007', 'PBM', 'IBL')]
Though I think it most sense to keep it as a dict
with (name,id) tuples as the keys, as we generated in the first step.
Upvotes: 1
Reputation: 38257
itertools.groupby
with some additional processing of what it outputs will do the job:
from itertools import groupby
data = {
('9', '0000022', 'LRA'),
('45', '0000016', 'PBM'),
('16', '0000048', 'PBL'),
...
}
def group_by_name_and_id(s):
grouped = groupby(sorted(s), key=lambda (name, id_, type_): (name_, id))
for (name, id_), items in grouped:
types = tuple(type_ for _, _, type_ in items)
if len(types) > 1:
yield (name, id_) + types
print '\n'.join(str(x) for x in group_by_name_and_id(data))
outputs:
('10', '0000007', 'PBM', 'IBL')
('12', '0000051', 'LRA', 'PBL')
('7', '0000014', 'LRA', 'PBL', 'IBL')
('9', '0000022', 'LRA', 'PBL', 'IBL')
P.S. but I don't really like that design: thet types could/should really be a list contained in the 3rd item of the tuple, not part of the tuple itself... because this way the tuple is dynamic in length, and that's ugly... tuples aren't meant to be used like that. So best to replace
types = tuple(type_ for _, _, type_ in items)
yield (name, id_) + types
with
types = [type_ for _, _, type_ in items]
yield (name, id_, types)
yielding the much cleaner looking
('10', '0000007', ['IBL', 'PBM'])
('12', '0000051', ['LRA', 'PBL'])
('7', '0000014', ['IBL', 'LRA', 'PBL'])
('9', '0000022', ['IBL', 'LRA', 'PBL'])
for example then you can just iterate over the resulting data with for name, id, types in transformed_data:
.
Upvotes: 3