Reputation: 23
If I have two lists, List_A
and List_B
what faster way is there if I want to count the number of occurrences of each element in List_B
from List_A
and parse the results in a new List_C
?
Usually Id use list comprehension but once the number of elements in either List_A
or List_B
increase above 100 000 it starts taking quite a lot of time.
List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
List_B = ['a', 'b','c','d','e','f','g','h']
List_C = [List_A.count(x) for x in List_B]
List_C
#Output:
#List_C = [3, 1, 2, 2, 1, 3, 1, 3]
Upvotes: 2
Views: 104
Reputation: 817
def a():
List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
List_B = ['a', 'b','c','d','e','f','g','h', 'z']
count = Counter(List_A)
List_C = [count.get(x) for x in List_B]
100000 loops, best of 5: 2.96 usec per loop
Using Counter() and checking for None:
def b():
List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
List_B = ['a', 'b','c','d','e','f','g','h', 'z']
count = Counter(List_A)
List_C = []
for x in List_B:
val = count.get(x)
if val != None:
List_C.append(val)
100000 loops, best of 5: 3.45 usec per loop
Without checking None values:
def c():
List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
List_B = ['a', 'b','c','d','e','f','g','h', 'z']
count = Counter(List_A)
List_C = []
for x in List_B:
List_C.append(count.get(x))
100000 loops, best of 5: 3.04 usec per loop
Using @Mafa's solution, which doesn't work if List_B have values that don't appear in List_A:
def d():
List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
List_B = ['a', 'b','c','d','e','f','g','h']
occ = dict()
for x in List_A:
occ.setdefault(x, 0)
occ[x] += 1
List_C = [occ[x] for x in List_B]
100000 loops, best of 5: 2.59 usec per
Mafa's Solution with checks for existing values:
def e():
List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
List_B = ['a', 'b','c','d','e','f','g','h']
occ = dict()
for x in List_A:
occ.setdefault(x, 0)
occ[x] += 1
List_C = [occ.get(x, 0) for x in List_B]
100000 loops, best of 5: 3.1 usec per loop
The two next functions were proposed by @Alex Waygood
def f():
List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
List_B = ['a', 'b','c','d','e','f','g','h']
c = Counter(filter(set(List_B).__contains__, List_A))
List_C = [v for k, v in sorted(c.items())]
50000 loops, best of 5: 4.45 usec per loop
def g():
List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
List_B = ['a', 'b','c','d','e','f','g','h', 'z']
c = Counter(filter(List_B.__contains__, List_A))
List_C = [v for k, v in sorted(c.items())]
50000 loops, best of 5: 4.78 usec per loop
(Not sure if need to divide by two here -- apparently not --, since there are 50000 loops instead of 100000, if that is the case we have a clear winner here)
Upvotes: 4
Reputation: 3900
With your solution you perform a number of computations equal to len(List_A) * len(List_B)
(your list comprehension).
Instead, first compute the number of occurences then do your list comprehension:
List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
List_B = ['a', 'b','c','d','e','f','g','h']
occ = dict()
for x in List_A:
occ.setdefault(x, 0)
occ[x] += 1
List_C = [occ.get(x, 0) for x in List_B]
By this way you traverse List_A
once and List_B
once.
[Edit]
Updated the list comprehension at last line to address the case where x
is not in List_A
(see @AchilleG's comment)
Upvotes: 1