Turi
Turi

Reputation: 77

Remove NOT duplicates value from list

The scenario is this something like this:

After joining several lists using:

list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]

mainlist = list1 + list2 + list3
mainlist.sort()

mainlist now looks like that:

mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'E']

I would like to remove anything that is not a duplicate value. If the value in question is already present in the list it must not be touched and while if it is present only once in the mainlist I would like to delete it.

I tried to use this approach but seems something isn't working:

for i in mainlist:
    if mainlist.count(i) <= 1:
        mainlist.remove(i)
    else:
        continue

but what I return is a list that looks like the following:

mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'E'] #value "D" is not anymore present. Why?

What i would like to return is a list like that:

mainlist = ['A', 'A', 'B', 'B', 'C', 'C'] #All values NOT duplicates have been deleted

I can delete the duplicates with the below code:

for i in mainlist:
    if mainlist.count(i) > 1:
        mainlist.remove(i)
    else:
        continue

and then as a final result:

mainlist = ['A','B','C']

But the real question is: how can I delete the non-duplicates in a list?

Upvotes: 2

Views: 1161

Answers (6)

Andrej Kesely
Andrej Kesely

Reputation: 195438

Another solution, using numpy:

u, c = np.unique(mainlist, return_counts=True)
out = np.repeat(u[c > 1], c[c > 1])
print(out)

Prints:

['A' 'A' 'B' 'B' 'C' 'C']

Upvotes: 2

You can find duplicates like this:

duplicates = [item for item in mainlist if mainlist.count(item) > 1]

Upvotes: 4

ljmc
ljmc

Reputation: 5264

If you want to output only a list of duplicate elements in your lists, you can use sets and a comprehension to keep only the duplicates.

list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]

fulllist = list1 + list2 + list3
fullset = set(list1) | set(list2) | set(list3)

dups = [x for x in fullset if fulllist.count(x) > 1]

print(dups)  # ['A', 'C', 'B']

Upvotes: 1

O.Schmitt
O.Schmitt

Reputation: 129

Your problem lies in you operating on the while iterating over it. After removing the "D" the loops stops because there are no more elements in the list as the "E" at index 6.

Create a copy of the list and only operate on that list:

new_list = list(mainlist)
for i in mainlist:
    if mainlist.count(i) <= 1:
        new_list.remove(i)
    else:
        continue

Upvotes: 1

Timur Shtatland
Timur Shtatland

Reputation: 12347

Use collections.Counter to count the list elements. Use list comprehension to keep only the elements that occur more than once. Note that the list does not have to be sorted.

from collections import Counter
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
mainlist = list1 + list2 + list3

cnt = Counter(mainlist)
print(cnt)
# Counter({'A': 2, 'B': 2, 'C': 2, 'D': 1, 'E': 1})

dups = [x for x in mainlist if cnt[x] > 1]
print(dups)
# ['A', 'B', 'A', 'B', 'C', 'C']

Upvotes: 2

BrokenBenchmark
BrokenBenchmark

Reputation: 19242

You can use collections.Counter() to keep track of the frequencies of each item:

from collections import Counter

counts = Counter(mainlist)
[item for item in mainlist if counts[item] > 1]

This outputs:

['A', 'A', 'B', 'B', 'C', 'C']

Upvotes: 1

Related Questions