Mafa
Mafa

Reputation: 23

What faster way is there when counting occurrences of elements in one using another list

If I have two lists, List_A and List_B what faster way is there if I want to count the number of occurrences of each element in List_B from List_A and parse the results in a new List_C? Usually Id use list comprehension but once the number of elements in either List_A or List_B increase above 100 000 it starts taking quite a lot of time.

List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']

List_B = ['a', 'b','c','d','e','f','g','h']

List_C = [List_A.count(x) for x in List_B]

List_C

#Output:

#List_C = [3, 1, 2, 2, 1, 3, 1, 3]

Upvotes: 2

Views: 104

Answers (2)

Achille G
Achille G

Reputation: 817

def a():
    List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
    List_B = ['a', 'b','c','d','e','f','g','h', 'z']

    count = Counter(List_A)
    List_C = [count.get(x) for x in List_B]

100000 loops, best of 5: 2.96 usec per loop


Using Counter() and checking for None:

def b():
    List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
    List_B = ['a', 'b','c','d','e','f','g','h', 'z']

    count = Counter(List_A)

    List_C = []

    for x in List_B:
        val = count.get(x)
        if val != None:
            List_C.append(val)

100000 loops, best of 5: 3.45 usec per loop


Without checking None values:

def c():
    List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
    List_B = ['a', 'b','c','d','e','f','g','h', 'z']

    count = Counter(List_A)

    List_C = []

    for x in List_B:
        List_C.append(count.get(x))

100000 loops, best of 5: 3.04 usec per loop


Using @Mafa's solution, which doesn't work if List_B have values that don't appear in List_A:

def d():
    List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
    List_B = ['a', 'b','c','d','e','f','g','h']
    occ = dict()
    for x in List_A:
        occ.setdefault(x, 0)
        occ[x] += 1
    List_C = [occ[x] for x in List_B]

100000 loops, best of 5: 2.59 usec per


Mafa's Solution with checks for existing values:

def e():
    List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
    List_B = ['a', 'b','c','d','e','f','g','h']
    occ = dict()
    for x in List_A:
        occ.setdefault(x, 0)
        occ[x] += 1
    List_C = [occ.get(x, 0) for x in List_B]

100000 loops, best of 5: 3.1 usec per loop


The two next functions were proposed by @Alex Waygood

def f():
    List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
    List_B = ['a', 'b','c','d','e','f','g','h']
    c = Counter(filter(set(List_B).__contains__, List_A))
    List_C = [v for k, v in sorted(c.items())]

50000 loops, best of 5: 4.45 usec per loop


def g():
    List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
    List_B = ['a', 'b','c','d','e','f','g','h', 'z']
    c = Counter(filter(List_B.__contains__, List_A))
    List_C = [v for k, v in sorted(c.items())]

50000 loops, best of 5: 4.78 usec per loop

(Not sure if need to divide by two here -- apparently not --, since there are 50000 loops instead of 100000, if that is the case we have a clear winner here)

Upvotes: 4

qouify
qouify

Reputation: 3900

With your solution you perform a number of computations equal to len(List_A) * len(List_B) (your list comprehension).

Instead, first compute the number of occurences then do your list comprehension:

List_A = ['a', 'd','f','h','g','e','f','a','f','h','h','d','b','c','c','a']
List_B = ['a', 'b','c','d','e','f','g','h']
occ = dict()
for x in List_A:
    occ.setdefault(x, 0)
    occ[x] += 1
List_C = [occ.get(x, 0) for x in List_B]

By this way you traverse List_A once and List_B once.

[Edit]

Updated the list comprehension at last line to address the case where x is not in List_A (see @AchilleG's comment)

Upvotes: 1

Related Questions