b0bu
b0bu

Reputation: 1230

Python 3 occurrence matching

I started to write this on a whim for little more than curiosity. I've been looking at the code in a visualizer and it looks like it iterates like I'd expect but doesn't output what I think it should. Can someone show me what I'm missing? It's just a funny example of how sql join tables look after processing.

   def query(a=[1,2,3,4], b=[3,1,1,2,3,4,5,6]):
        """
        table A         table B           Expected Output     Actual Output   
        idx value       idx value         indxA indxB         indxA indxB  
        0     1          0    3             0     1             0     1
        1     2          1    1             0     2             0     1
        2     3          2    1             1     3             1     3 
        3     4          3    2             2     0             2     0
                         5    4             2     3             2     0
                         6    5             3     5             3     5
                         7    6

        EXAMPLE
        Table A index 0 occurs at Table B index 1 and 2
        PROBLEM
        Anywhere there are multiple matches only first occurrence prints
        """
        for idx, itemA in enumerate(a):
            if itemA in b:
                for itemB in b:
                    if itemA == itemB:
                        print("{} {}".format(a.index(itemA), b.index(itemB)))


    query()

Upvotes: 0

Views: 65

Answers (2)

Mike Müller
Mike Müller

Reputation: 85442

For large lists iterating over each element in one list for each other element in the other list can be slow. This is a quadratic algorithm.

This is the solution with only lists. I took the printing out as it would use most of time and return the results in a list:

def query_list(a, b):
    res = []
    for index_a, value_a in enumerate(a):
        for index_b, value_b in enumerate(b):
            if value_a == value_b:
                res.append((index_a, index_b))
    return res

This is an alternative implementation using dictionaries:

def query_dict(a, b):
    indices = {value: index for index, value in enumerate(a)}
    res = []
    for index_b, value_b in enumerate(b):
        if value_b in indices:
            res.append((indices[value_b], index_b))
    return sorted(res)

Generating some example data:

import random
a = list(range(1, 1000))
b = [random.randint(1, 100) for x in range(10000)]

The list version:

%timeit query_list(a, b)
1 loops, best of 3: 1.09 s per loop

is much slower than the dict version:

%timeit query_dict(a, b)
100 loops, best of 3: 11.8 ms per loop

This about a factor of 10.

Using larger example data:

import random
a = list(range(1, 10000))
b = [random.randint(1, 100) for x in range(10000)]

The difference becomes even more pronounced:

%timeit query_list(a, b)
1 loops, best of 3: 11.4 s per loop

%timeit query_dict(a, b)
100 loops, best of 3: 13.7 ms per loop

Going up to a factor of close to 100.

Upvotes: 1

ymyzk
ymyzk

Reputation: 983

list.index(x) returns the index in the list of the first item whose value is x. So you should use enumerate(b).

def query(a=[1,2,3,4], b=[3,1,1,2,3,4,5,6]):
    for index_a, value_a in enumerate(a):
        for index_b, value_b in enumerate(b):
            if value_a == value_b:
                print("{} {}".format(index_a, index_b))

query()

Upvotes: 2

Related Questions