Reputation: 2350
I have lists like this:
a = [('JoN', 12668, 0.0036), ('JeSsIcA', 1268, 0.0536), ('JoN', 1668, 0.00305), ('King', 16810, 0.005)]
b = [('JoN', 12668, 0.0036), ('JON', 16680, 0.00305), ('MeSSi', 115, 0.369)]
I want the resultant list to be like:
result = [(('JoN', 12668, 0.0036), ('JoN', 12668, 0.0036)), (('JoN', 1668, 0.00305), ('JON', 16680, 0.00305)), (('King', 16810, 0.005), None), (None, ('MeSSi', 115, 0.369))]
I have tried nested loops, sets, map, zip but failed to achieve this output. kindly help me out.
Upvotes: 0
Views: 61
Reputation: 77414
from string import lower
from itertools import groupby
from operator import itemgetter
def compose(f, g):
def h(*args, **kwargs):
return f(*g(*args, **kwargs))
return h
def lower_first(*args):
return (lower(args[0]),) + args[1:]
sorting_key = compose(lower_first, itemgetter(0, 2, 1))
grouping_key = compose(lower_first, itemgetter(0, 2))
output = [tuple(v) for k,v in groupby(sorted(a+b, key=sorting_key),
key=grouping_key)]
gives output
as
[(('JeSsIcA', 1268, 0.0536),),
(('JoN', 1668, 0.00305), ('JON', 16680, 0.00305)),
(('JoN', 12668, 0.0036), ('JoN', 12668, 0.0036)),
(('King', 16810, 0.005),),
(('MeSSi', 115, 0.369),)]
Then adding the None
values is easy:
final_output = [ elem if len(elem) >= 2
else ((None,)+ elem) if elem[0] not in a else elem + (None,)
for elem in output
]
which gives:
[(('JeSsIcA', 1268, 0.0536), None),
(('JoN', 1668, 0.00305), ('JON', 16680, 0.00305)),
(('JoN', 12668, 0.0036), ('JoN', 12668, 0.0036)),
(('King', 16810, 0.005), None),
(None, ('MeSSi', 115, 0.369))]
But you need to be careful, because stating a problem like this with lists often glosses over problems of relational joins that would be taken care of by a system with proper indexing, like a pandas.DataFrame
which seems more likely to be the kind of data structure you want, due to its native join
and merge
capabilities.
Upvotes: 0
Reputation: 250931
Convert a
and b
to dictionaries first using the first(use str.lower()
in it) and third item as key and then later on loop on the union of the keys in a list comprehension to get the desired output:
>>> from pprint import pprint
>>> dct_a = {(x[0].lower(), x[2]): x for x in a}
>>> dct_b = {(x[0].lower(), x[2]): x for x in b}
>>> out = [(dct_a.get(k), dct_b.get(k)) for k in set(dct_a).union(dct_b)]
>>> pprint(out)
[(('JoN', 12668, 0.0036), ('JoN', 12668, 0.0036)),
(('JoN', 1668, 0.00305), ('JON', 16680, 0.00305)),
(('King', 16810, 0.005), None),
(('JeSsIcA', 1268, 0.0536), None),
(None, ('MeSSi', 115, 0.369))]
Upvotes: 2