Reputation: 1230
I started to write this on a whim for little more than curiosity. I've been looking at the code in a visualizer and it looks like it iterates like I'd expect but doesn't output what I think it should. Can someone show me what I'm missing? It's just a funny example of how sql join tables look after processing.
def query(a=[1,2,3,4], b=[3,1,1,2,3,4,5,6]):
"""
table A table B Expected Output Actual Output
idx value idx value indxA indxB indxA indxB
0 1 0 3 0 1 0 1
1 2 1 1 0 2 0 1
2 3 2 1 1 3 1 3
3 4 3 2 2 0 2 0
5 4 2 3 2 0
6 5 3 5 3 5
7 6
EXAMPLE
Table A index 0 occurs at Table B index 1 and 2
PROBLEM
Anywhere there are multiple matches only first occurrence prints
"""
for idx, itemA in enumerate(a):
if itemA in b:
for itemB in b:
if itemA == itemB:
print("{} {}".format(a.index(itemA), b.index(itemB)))
query()
Upvotes: 0
Views: 65
Reputation: 85442
For large lists iterating over each element in one list for each other element in the other list can be slow. This is a quadratic algorithm.
This is the solution with only lists. I took the printing out as it would use most of time and return the results in a list:
def query_list(a, b):
res = []
for index_a, value_a in enumerate(a):
for index_b, value_b in enumerate(b):
if value_a == value_b:
res.append((index_a, index_b))
return res
This is an alternative implementation using dictionaries:
def query_dict(a, b):
indices = {value: index for index, value in enumerate(a)}
res = []
for index_b, value_b in enumerate(b):
if value_b in indices:
res.append((indices[value_b], index_b))
return sorted(res)
Generating some example data:
import random
a = list(range(1, 1000))
b = [random.randint(1, 100) for x in range(10000)]
The list version:
%timeit query_list(a, b)
1 loops, best of 3: 1.09 s per loop
is much slower than the dict version:
%timeit query_dict(a, b)
100 loops, best of 3: 11.8 ms per loop
This about a factor of 10.
Using larger example data:
import random
a = list(range(1, 10000))
b = [random.randint(1, 100) for x in range(10000)]
The difference becomes even more pronounced:
%timeit query_list(a, b)
1 loops, best of 3: 11.4 s per loop
%timeit query_dict(a, b)
100 loops, best of 3: 13.7 ms per loop
Going up to a factor of close to 100.
Upvotes: 1
Reputation: 983
list.index(x)
returns the index in the list of the first item whose value is x.
So you should use enumerate(b)
.
def query(a=[1,2,3,4], b=[3,1,1,2,3,4,5,6]):
for index_a, value_a in enumerate(a):
for index_b, value_b in enumerate(b):
if value_a == value_b:
print("{} {}".format(index_a, index_b))
query()
Upvotes: 2