Jeremy_Tamu
Jeremy_Tamu

Reputation: 755

Python: Count and Remove duplicates in the list of list

I have a list of list:

a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
     [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
     [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
     [1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
     [5.0, 5.0, 5.0], 
     [1.0]
    ]

a= set(a)

What I need to do is removing all the duplicates in the list of list and keep the previous sequence. In addition, I need to count the number for each duplicates in the list. Such as

The list of list after removing the duplicates:

a = [[1.0],
     [2.0, 3.0, 4.0],
     [3.0, 5.0],
     [1.0, 4.0, 5.0],
     [5.0], 
     [1.0]
    ]

The count number of duplicates in the list of list

b = [[13],
     [6, 5, 4],
     [8, 3],
     [1, 3, 3],
     [3], 
     [1]
    ]

My code:

for index, lst in enumerate(a):
    seen = set()
    a[index] = [i for i in lst if i not in seen and seen.add(i) is None]

Upvotes: 0

Views: 7938

Answers (6)

Howard Davis
Howard Davis

Reputation: 322

There's no need to go to extremes to find this out, it can be done with simple math.

the_list = [34, 40, 17, 6, 6, 48, 35, 8, 23, 41, 3, 36, 14, 44, 4, 46, 13, 26, 8, 41, 48, 39, 3, 43, 7, 20, 44, 17, 14, 18, 4, 3, 38, 42, 4, 19, 50, 38, 19, 40, 3, 26, 33, 26, 47, 46, 30, 12, 28, 32]
print(len(the_list) - len(list(set(the_list))))

With comments:

# list with duplicates
the_list = [34, 40, 17, 6, 6, 48, 35, 8, 23, 41, 3, 36, 14, 44, 4, 46, 13, 26, 8, 41, 48, 39, 3, 43, 7, 20, 44, 17, 14, 18, 4, 3, 38, 42, 4, 19, 50, 38, 19, 40, 3, 26, 33, 26, 47, 46, 30, 12, 28, 32]

# in actual lists where you don't know the amount of items,
# determine the amount with len()
list_size = len(the_list)

# remove the duplicates using set(),
# since there was no mention of converting
# we'll also convert back to list()
the_list = list(set(the_list))

# how many duplicates?
duplicates = list_size - len(the_list)

print(f"Total items in list: {list_size}")
print(f"Number of duplicates removed: {duplicates}")

Upvotes: 0

Tyler Hartman
Tyler Hartman

Reputation: 1

I had to develop something similar to this recently. My solution was to iterate through the list and create an array that has the value along with the quantity of the value the original list contained.

    def count_duplicates(input_list):
        count_list = []
        for each in input_list:
            new_count = [each, input_list.count(each)]
            if count_list.count(new_count) >= 1:
                continue
            else:
                count_list.append(new_count)
        return count_list

By running the above function inside of a for-each loop and setting a new list equal to the list of lists, you could make an output that contains everything you need it to.

Upvotes: 0

Padraic Cunningham
Padraic Cunningham

Reputation: 180401

This is efficient:

b = [list(set(x)) for x in a]

c =  [[a[ind].count(x) for x in ele] for ind, ele in enumerate(b)]

Timings on a list of 50 sublists:

In [8]: %%timeit
   ...: b = []
   ...: c = []
   ...: for inner in a:
   ...:     new_b = []
   ...:     new_c = []
   ...:     for value, repeated in groupby(sorted(inner)):
   ...:         new_b.append(value)
   ...:         new_c.append(sum(1 for _ in repeated))
   ...:     b.append(new_b)
   ...:     c.append(new_c)
   ...: 
10 loops, best of 3: 20.4 ms per loop

In [9]: %%timeit
    dic_count = [ Counter(x) for x in a]
    [ x.keys() for x in dic_count ]
    [ x.values() for x in dic_count ]
   ...: 
10 loops, best of 3: 39.1 ms per loop

In [10]: %%timeit
    b = [list(set(x)) for x in a]
    c = [a[ind].count(x) for x in ele]for ind, ele in enumerate(b)]
   ....: 
100 loops, best of 3: 7.95 ms per loop

Upvotes: 1

ZJS
ZJS

Reputation: 4051

Hi you probably shouldn't use this code (I was just playing around with some new functions I haven't tried yet) but this gets you your desired output...

from collections import Counter
from itertools import *
vals = zip(*(izip(*izip(row.keys(),row.values())) for row in (dict(Counter(each)) for each in a)))
print vals[0],"\n", vals[1]

If I were you I would just work off of this...

[dict(Counter(each)) for each in a]

Very clean output and more readable than my solution

Upvotes: 1

furas
furas

Reputation: 142641

Use collections.Counter()

from collections import Counter

a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
     [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
     [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
     [1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
     [5.0, 5.0, 5.0], 
     [1.0]
    ]

dic_count = [ Counter(x) for x in a]

print dic_count

'''
[
    Counter({1.0: 13}),
    Counter({2.0: 6, 3.0: 5, 4.0: 4}),
    Counter({3.0: 8, 5.0: 3}),
    Counter({4.0: 3, 5.0: 3, 1.0: 1}),
    Counter({5.0: 3}),
    Counter({1.0: 1})
]
'''

print [ x.keys() for x in dic_count ]

'''
[
     [1.0],
     [2.0, 3.0, 4.0],
     [3.0, 5.0],
     [1.0, 4.0, 5.0],
     [5.0],
     [1.0]
]
'''

print [ x.values() for x in dic_count ]

'''
[
    [13],
    [6, 5, 4],
    [8, 3],
    [1, 3, 3],
    [3],
    [1]
]
'''

Upvotes: 3

Maciej Gol
Maciej Gol

Reputation: 15854

You can use the itertools.groupby:

from itertools import groupby

a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
     [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
     [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
     [1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
     [5.0, 5.0, 5.0], 
     [1.0]
    ]

b = []
c = []

for inner in a:
    new_b = []
    new_c = []
    for value, repeated in groupby(sorted(inner)):
        new_b.append(value)
        new_c.append(sum(1 for _ in repeated))

    b.append(new_b)
    c.append(new_c)

print b
# [[1.0], [2.0, 3.0, 4.0], [3.0, 5.0], [1.0, 4.0, 5.0], [5.0], [1.0]]
print c
# [[13], [6, 5, 4], [8, 3], [1, 3, 3], [3], [1]]

Upvotes: 3

Related Questions