Reputation: 2439
I extract specific names from text using regex etc. The result is a list of tuples containing titles and names. The tuples might be of a different length. lst
below shows a list of possible scenarios. I need to remove duplicate names from the result. For example, ('Lord', 'Justice') == ('Lord', 'Justice', 'Smith'), and ('Lady', 'Smiles') == ('Lady', 'Justice', 'Smiles'), but ('Lord', 'Justice', 'Smith') and ('Lady', 'Justice', 'Smiles') are different names. The desired output for each element in lst
should be [('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles')]
.
lst = [[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Justice')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Smiles'), ('Lady', 'Justice', 'Smiles')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice'), ('Lady', 'Justice', 'Smiles')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lady', 'Smiles')]]
This is what I have right now but it doesn't yield the desired output. Will really appreciate your help and suggestions.
for l in lst:
print(l)
# remove duplicates based on the last index in tuples
lst_1 = list(dict((v[-1],v) for v in sorted(l, key=lambda l: lst[0])).values())
print(lst_1)
# remove duplicates based on the second index [1] in tuples
lst_2 = list(dict((v[1],v) for v in sorted(lst_1, key=lambda lst_1: lst_1[0])).values())
print(lst_2)
print("\n")
UPDATE:
I was probably too specific in my examples. I had to include other names so the solution should work when there are other names present:
lst = [
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Justice'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Smiles'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lady', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
]
Desirable output:
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
Upvotes: 3
Views: 124
Reputation: 12005
You can do this easily using itertools.groupby
lst = [
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Justice'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Smiles'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lady', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Another'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
]
res = [[max(reversed(list(v)), key=len) for k,v in groupby(sl, lambda x: x[0])] for sl in lst]
for l in res:
print(l)
Output
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Other'), ('Lady', 'Diana', 'Spencer'), ('Lord', 'Dave', 'Castle')]
Upvotes: 1
Reputation: 195418
I came with this solution:
from itertools import chain, groupby
lst = [
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lord', 'Justice')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Smiles'), ('Lady', 'Justice', 'Smiles')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice'), ('Lady', 'Justice', 'Smiles')],
[('Lord', 'Justice', 'Smith'), ('Lady', 'Justice', 'Smiles'), ('Lady', 'Smiles')]
]
def remove_duplicates(lst):
rv = []
for g, v in groupby([g for g, _ in groupby(sorted(lst))], key=lambda v: v[0]):
rv.append(max(list(v), key=lambda v: len(v)))
return rv
for option in lst:
print(remove_duplicates(option))
Outputs:
[('Lady', 'Justice', 'Smiles'), ('Lord', 'Justice', 'Smith')]
[('Lady', 'Justice', 'Smiles'), ('Lord', 'Justice', 'Smith')]
[('Lady', 'Justice', 'Smiles'), ('Lord', 'Justice', 'Smith')]
[('Lady', 'Justice', 'Smiles'), ('Lord', 'Justice', 'Smith')]
[('Lady', 'Justice', 'Smiles'), ('Lord', 'Justice', 'Smith')]
Upvotes: 1