Reputation: 65
I have to build a distance matrix for locations
addresses = [(10.0, 20.0), (21.2318, 72.903), (26.4499, 80.3319), (20.0, 20.0), (19.114, 72.8927), (20.4189, 77.0153), (28.5377, 77.1217), (28.6201, 77.118), (28.5257, 77.2781)]
where distance in between two locations, is computed using the following function
def calculate_travel_distance(pointA,pointB):
return geodesic(pointA,pointB).meters
Distance Matrix is being build up using for loop as below
dist_matrix=[]
for pt1 in addresses:
row_list = [round(calculate_travel_distance(pt1,pt2),0) for pt2 in addresses]
dist_matrix.append(row_list)
When size of addresses grow up, it takes long time to execute i.e. 300 locations ( lat/lng pair) it takes 150 seconds to execute. Is it possible to bring down execution time to few secs(may be 10) for close to 400 locations. Please suggest.
Upvotes: 0
Views: 147
Reputation: 338326
Assuming you have a list of points A
through J
, then the matrix pairing all of those up looks like this:
AA AB AC AD AE AF AG AH AI AJ
BA BB BC BD BE BF BG BH BI BJ
CA CB CC CD CE CF CG CH CI CJ
DA DB DC DD DE DF DG DH DI DJ
EA EB EC ED EE EF EG EH EI EJ
FA FB FC FD FE FF FG FH FI FJ
GA GB GC GD GE GF GG GH GI GJ
HA HB HC HD HE HF HG HH HI HJ
IA IB IC ID IE IF IG IH II IJ
JA JB JC JD JE JF JG JH JI JJ
That's what your loop currently calculates. However, the distance AB
and BA
is equal, and the distances on the center line (AA
, BB
, ...) are always zero.
We can cut out workload in half (even less than half, from n^2
to n^2 / 2 - n
) by calculating only the distances between points where x < y
in the matrix.
AB AC AD AE AF AG AH AI AJ
BC BD BE BF BG BH BI BJ
CD CE CF CG CH CI CJ
DE DF DG DH DI DJ
EF EG EH EI EJ
FG FH FI FJ
GH GI GJ
HI HJ
IJ
The blanks can be filled easily by mirroring the upper triangle. Staying within the example, this:
addresses = ['A','B','C','D','E','F','G','H','I','J']
distances = []
for x, a in enumerate(addresses):
row = []
distances.append(row)
for y, b in enumerate(addresses):
if x < y:
row.append(a + b) # actually calculate something
elif x == y:
row.append('--') # that's always 0
else:
row.append(distances[y][x]) # we already calculated that
for row in distances:
print(' '.join(row))
gives us this:
-- AB AC AD AE AF AG AH AI AJ
AB -- BC BD BE BF BG BH BI BJ
AC BC -- CD CE CF CG CH CI CJ
AD BD CD -- DE DF DG DH DI DJ
AE BE CE DE -- EF EG EH EI EJ
AF BF CF DF EF -- FG FH FI FJ
AG BG CG DG EG FG -- GH GI GJ
AH BH CH DH EH FH GH -- HI HJ
AI BI CI DI EI FI GI HI -- IJ
AJ BJ CJ DJ EJ FJ GJ HJ IJ --
The next step up in speed could be multi-threading, but maybe this optimization already is good enough for your use case.
A multi-threaded implementation of the above could look like this (it's probably not pythonic in more ways than one, but it does the job):
from multiprocessing import cpu_count
from multiprocessing.dummy import Pool as ThreadPool
# credit https://stackoverflow.com/a/54802737
def chunks(l, n):
"""Yield n number of striped chunks from l."""
for i in range(0, n):
yield l[i::n]
def calculate_travel_distance(a, b):
return a + b
def calculate_distance_matrix(addresses):
# prepare distance matrix, list of lists with n^2 slots
distance_matrix = [['--' for a in addresses] for b in addresses]
# the workload is the upper matrix triangle (where x < y)
# since we're multi-threading, also remember the x,y position
workload = [((a, b),(x, y)) for y, b in enumerate(addresses) for x, a in enumerate(addresses) if x < y]
# worker function
def worker(chunk):
return [(calculate_travel_distance(*points), matrix_pos) for points, matrix_pos in chunk]
# distribute workload over available CPU cores
pool = ThreadPool(cpu_count())
result_chunks = pool.map(worker, chunks(workload, cpu_count()))
# distribute result chunks into their slots
for result_chunk in result_chunks:
for result, matrix_pos in result_chunk:
x, y = matrix_pos
distance_matrix[x][y] = result
distance_matrix[y][x] = result
return distance_matrix
addresses = ['A','B','C','D','E','F','G','H','I','J']
distaince_matrix = calculate_distance_matrix(addresses)
for row in distaince_matrix:
print(' '.join(row))
It prints the same thing:
-- AB AC AD AE AF AG AH AI AJ
AB -- BC BD BE BF BG BH BI BJ
AC BC -- CD CE CF CG CH CI CJ
AD BD CD -- DE DF DG DH DI DJ
AE BE CE DE -- EF EG EH EI EJ
AF BF CF DF EF -- FG FH FI FJ
AG BG CG DG EG FG -- GH GI GJ
AH BH CH DH EH FH GH -- HI HJ
AI BI CI DI EI FI GI HI -- IJ
AJ BJ CJ DJ EJ FJ GJ HJ IJ --
Upvotes: 1