Reputation: 225

Removing some of the duplicates from a list in Python

I would like to remove a certain number of duplicates of a list without removing all of them. For example, I have a list [1,2,3,4,4,4,4,4] and I want to remove 3 of the 4's, so that I am left with [1,2,3,4,4]. A naive way to do it would probably be

def remove_n_duplicates(remove_from, what, how_many):
    for j in range(how_many):
        remove_from.remove(what)

Is there a way to do remove the three 4's in one pass through the list, but keep the other two.

Upvotes: 7

Answers (5)

Saravanan Subramanian

Reputation: 433

I can solve it in different way using collections.

from collections import Counter
li = [1,2,3,4,4,4,4]
cntLi = Counter(li)
print cntLi.keys()

Upvotes: -1

Checkmate

Reputation: 1164

If the list is sorted, there's the fast solution:

def remove_n_duplicates(remove_from, what, how_many):
    index = 0
    for i in range(len(remove_from)):
        if remove_from[i] == what:
            index = i
            break
    if index + how_many >= len(remove_from):
        #There aren't enough things to remove.
        return
    for i in range(index, how_many):
        if remove_from[i] != what:
            #Again, there aren't enough things to remove
            return
    endIndex = index + how_many
    return remove_from[:index+1] + remove_from[endIndex:]

Note that this returns the new array, so you want to do arr = removeCount(arr, 4, 3)

Upvotes: 0

Aguy

Reputation: 8059

Here is another trick which might be useful sometimes. Not to be taken as the recommended recipe.

def remove_n_duplicates(remove_from, what, how_many):
    exec('remove_from.remove(what);'*how_many)

Upvotes: -1

David N. Sanchez

Reputation: 1

You can use Python's set functionality with the & operator to create a list of lists and then flatten the list. The result list will be [1, 2, 3, 4, 4].

x = [1,2,3,4,4,4,4,4]
x2 = [val for sublist in [[item]*max(1, x.count(item)-3) for item in set(x) & set(x)] for val in sublist]

As a function you would have the following.

def remove_n_duplicates(remove_from, what, how_many):
    return [val for sublist in [[item]*max(1, remove_from.count(item)-how_many) if item == what else [item]*remove_from.count(item) for item in set(remove_from) & set(remove_from)] for val in sublist]

Upvotes: 0

mgilson

Reputation: 310167

If you just want to remove the first n occurrences of something from a list, this is pretty easy to do with a generator:

def remove_n_dupes(remove_from, what, how_many):
    count = 0
    for item in remove_from:
        if item == what and count < how_many:
            count += 1
        else:
            yield item

Usage looks like:

lst = [1,2,3,4,4,4,4,4]
print list(remove_n_dupes(lst, 4, 3))  # [1, 2, 3, 4, 4]

Keeping a specified number of duplicates of any item is similarly easy if we use a little extra auxiliary storage:

from collections import Counter
def keep_n_dupes(remove_from, how_many):
    counts = Counter()
    for item in remove_from:
        counts[item] += 1
        if counts[item] <= how_many:
            yield item

Usage is similar:

lst = [1,1,1,1,2,3,4,4,4,4,4]
print list(keep_n_dupes(lst, 2))  # [1, 1, 2, 3, 4, 4]

Here the input is the list and the max number of items that you want to keep. The caveat is that the items need to be hashable...

Upvotes: 8

Removing some of the duplicates from a list in Python

Answers (5)

Related Questions