Reputation: 755
I have a list of list:
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
a= set(a)
What I need to do is removing all the duplicates in the list of list and keep the previous sequence. In addition, I need to count the number for each duplicates in the list. Such as
The list of list after removing the duplicates:
a = [[1.0],
[2.0, 3.0, 4.0],
[3.0, 5.0],
[1.0, 4.0, 5.0],
[5.0],
[1.0]
]
The count number of duplicates in the list of list
b = [[13],
[6, 5, 4],
[8, 3],
[1, 3, 3],
[3],
[1]
]
My code:
for index, lst in enumerate(a):
seen = set()
a[index] = [i for i in lst if i not in seen and seen.add(i) is None]
Upvotes: 0
Views: 7938
Reputation: 322
There's no need to go to extremes to find this out, it can be done with simple math.
the_list = [34, 40, 17, 6, 6, 48, 35, 8, 23, 41, 3, 36, 14, 44, 4, 46, 13, 26, 8, 41, 48, 39, 3, 43, 7, 20, 44, 17, 14, 18, 4, 3, 38, 42, 4, 19, 50, 38, 19, 40, 3, 26, 33, 26, 47, 46, 30, 12, 28, 32]
print(len(the_list) - len(list(set(the_list))))
With comments:
# list with duplicates
the_list = [34, 40, 17, 6, 6, 48, 35, 8, 23, 41, 3, 36, 14, 44, 4, 46, 13, 26, 8, 41, 48, 39, 3, 43, 7, 20, 44, 17, 14, 18, 4, 3, 38, 42, 4, 19, 50, 38, 19, 40, 3, 26, 33, 26, 47, 46, 30, 12, 28, 32]
# in actual lists where you don't know the amount of items,
# determine the amount with len()
list_size = len(the_list)
# remove the duplicates using set(),
# since there was no mention of converting
# we'll also convert back to list()
the_list = list(set(the_list))
# how many duplicates?
duplicates = list_size - len(the_list)
print(f"Total items in list: {list_size}")
print(f"Number of duplicates removed: {duplicates}")
Upvotes: 0
Reputation: 1
I had to develop something similar to this recently. My solution was to iterate through the list and create an array that has the value along with the quantity of the value the original list contained.
def count_duplicates(input_list):
count_list = []
for each in input_list:
new_count = [each, input_list.count(each)]
if count_list.count(new_count) >= 1:
continue
else:
count_list.append(new_count)
return count_list
By running the above function inside of a for-each loop and setting a new list equal to the list of lists, you could make an output that contains everything you need it to.
Upvotes: 0
Reputation: 180401
This is efficient:
b = [list(set(x)) for x in a]
c = [[a[ind].count(x) for x in ele] for ind, ele in enumerate(b)]
Timings on a list of 50 sublists:
In [8]: %%timeit
...: b = []
...: c = []
...: for inner in a:
...: new_b = []
...: new_c = []
...: for value, repeated in groupby(sorted(inner)):
...: new_b.append(value)
...: new_c.append(sum(1 for _ in repeated))
...: b.append(new_b)
...: c.append(new_c)
...:
10 loops, best of 3: 20.4 ms per loop
In [9]: %%timeit
dic_count = [ Counter(x) for x in a]
[ x.keys() for x in dic_count ]
[ x.values() for x in dic_count ]
...:
10 loops, best of 3: 39.1 ms per loop
In [10]: %%timeit
b = [list(set(x)) for x in a]
c = [a[ind].count(x) for x in ele]for ind, ele in enumerate(b)]
....:
100 loops, best of 3: 7.95 ms per loop
Upvotes: 1
Reputation: 4051
Hi you probably shouldn't use this code (I was just playing around with some new functions I haven't tried yet) but this gets you your desired output...
from collections import Counter
from itertools import *
vals = zip(*(izip(*izip(row.keys(),row.values())) for row in (dict(Counter(each)) for each in a)))
print vals[0],"\n", vals[1]
If I were you I would just work off of this...
[dict(Counter(each)) for each in a]
Very clean output and more readable than my solution
Upvotes: 1
Reputation: 142641
Use collections.Counter()
from collections import Counter
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
dic_count = [ Counter(x) for x in a]
print dic_count
'''
[
Counter({1.0: 13}),
Counter({2.0: 6, 3.0: 5, 4.0: 4}),
Counter({3.0: 8, 5.0: 3}),
Counter({4.0: 3, 5.0: 3, 1.0: 1}),
Counter({5.0: 3}),
Counter({1.0: 1})
]
'''
print [ x.keys() for x in dic_count ]
'''
[
[1.0],
[2.0, 3.0, 4.0],
[3.0, 5.0],
[1.0, 4.0, 5.0],
[5.0],
[1.0]
]
'''
print [ x.values() for x in dic_count ]
'''
[
[13],
[6, 5, 4],
[8, 3],
[1, 3, 3],
[3],
[1]
]
'''
Upvotes: 3
Reputation: 15854
You can use the itertools.groupby
:
from itertools import groupby
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
b = []
c = []
for inner in a:
new_b = []
new_c = []
for value, repeated in groupby(sorted(inner)):
new_b.append(value)
new_c.append(sum(1 for _ in repeated))
b.append(new_b)
c.append(new_c)
print b
# [[1.0], [2.0, 3.0, 4.0], [3.0, 5.0], [1.0, 4.0, 5.0], [5.0], [1.0]]
print c
# [[13], [6, 5, 4], [8, 3], [1, 3, 3], [3], [1]]
Upvotes: 3