john
john

Reputation: 1280

Keeping 2 Duplicates Only From a List Of Lists

I have a list or lists in python similar to the following:

[
['name1',value2],
['name2',value3],
['name3',value4],
['name4',value4],
['name5',value5],
['name6',value2],
['name7',value2],
['name8',value4]
]

I want to remove any list within the list that has more than 2 duplicates from the 'value' field. The resulting list would look like:

[
['name1',value2],
['name2',value3],
['name3',value4],
['name4',value4],
['name5',value5],
['name6',value2]
]

Edit:

I didn't think this would be a problem so kept it simple for a clear question, but i actually have four values and not two in each internal list. I.E:

[
['name1',value2,'something','else'],
['name2',value3,'something','else'],
['name3',value4,'something','else'],
['name4',value4,'something','else'],
['name5',value5,'something','else'],
['name6',value2,'something','else']
]

Ashwini Chaudhary's answer works perfectly but only returns the two first element and not all four... my fault for not adding the complete details. Lesson learned!

Upvotes: 3

Views: 99

Answers (3)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250881

if order doesn't matters:

In [14]: lis=[
['name1','value2','something','else'],
['name2','value3','something','else'],
['name3','value4','something','else'],
['name4','value4','something','else'],
['name5','value5','something','else'],
['name6','value2','something','else']
]

In [22]: dic={}

In [23]: for x in lis:
    dic.setdefault(x[1],[]).append([x[0]]+x[2:])
   ....:     
   ....:     

In [25]: dic
Out[25]: 
{'value2': [['name1', 'something', 'else'], ['name6', 'something', 'else']],
 'value3': [['name2', 'something', 'else']],
 'value4': [['name3', 'something', 'else'], ['name4', 'something', 'else']],
 'value5': [['name5', 'something', 'else']]}

In [27]: [[y[0]]+[x]+y[1:] for x in dic for y in dic[x][:2]]
Out[27]: 
[['name5', 'value5', 'something', 'else'],
 ['name3', 'value4', 'something', 'else'],
 ['name4', 'value4', 'something', 'else'],
 ['name2', 'value3', 'something', 'else'],
 ['name1', 'value2', 'something', 'else'],
 ['name6', 'value2', 'something', 'else']]

Upvotes: 1

gvalkov
gvalkov

Reputation: 4097

from collections import defaultdict

list1 = [['name1','value2'],
         ['name2','value3'],
         ['name3','value4'],
         ['name4','value4'],
         ['name5','value5'],
         ['name6','value2'],
         ['name7','value2'],
         ['name8','value4']]

list2 = [['name1','value2'],
         ['name2','value3'],
         ['name3','value4'],
         ['name4','value4'],
         ['name5','value5'],
         ['name6','value2']]

d = defaultdict(list)
for name, value in list1:
    d[value].append(name)

list3 = [[name, value] for value, names in d.items() for name in names[:2]]

print(sorted(list3) == sorted(list2))  # True

I am certain that someone will come up with a better solution that preserves order and works as an iterator.

Upvotes: 0

mechmind
mechmind

Reputation: 1767

This code do the trick:

from collections import defaultdict
def dup2(sequence):
    seen = defaultdict(int)
    for key, value in sequence:
        if seen[value] < 2:
            seen[value] += 1
            yield [key, value]

dup2 is a generator, so it process list as you iterate over result:

for key, value in dup2(seq):
    # ... your code here

To get result as plain list, use list function:

list(dup2(seq))

Upvotes: 2

Related Questions