Thev
Thev

Reputation: 1125

Python - speed up reverse geo-coding

I am currently performing a reverse-geocoding operation as follows:

import json
from shapely.geometry import shape, Point
import time

with open('districts.json') as f: districts = json.load(f)
# file also kept at https://raw.githubusercontent.com/Thevesh/Display/master/districts.json

def reverse_geocode(lon,lat):
    point = Point(lon, lat) # lon/lat
    for feature in districts['features']:
        polygon = shape(feature['geometry'])
        if polygon.contains(point): return [(feature['properties'])['ADM1_EN'], (feature['properties'])['ADM2_EN']]
    return ['','']

start_time = time.time()
for i in range(1000): test = reverse_geocode(103, 3)
print('----- Code ran in ' + "{:.3f}".format(time.time() - start_time) + ' seconds -----')

This takes about 13 seconds to reverse geocode 1000 points, which is fine.

However, I'm going to need to reverse geocode 10mil coordinate pairs for a task, which means it'll take 130k seconds (1.5 days) assuming linear complexity. Not fine.

The obvious inefficiency in this algorithm is that it iterates through the entire set of polygons each and every time it classifies a point, which is a giant waste of time.

How can I improve this code? To compute 10mil pairs in a time acceptable for the task, I need to run 1k pairs in 1 second.

Upvotes: 0

Views: 627

Answers (1)

Gustavo Pereira
Gustavo Pereira

Reputation: 11

I arrived at this algorithm using parallelism

if possible, return it to me if it was useful for your purpose. remember that it is an amateur algorithm, it will need adjustments.

import concurrent.futures

with open('districts.json') as f: districts = json.load(f)

def reverse_geocode(lon:int, lat:int) -> list:

    point = Point(lon, lat) # lon/lat
    for feature in districts['features']:
        polygon = shape(feature['geometry'])
        if polygon.contains(point):
            return [(feature['properties'])['ADM1_EN'], (feature['properties'])['ADM2_EN']]
    return ['','']

if __name__ == '__main__':
    time_start = time.time()

    with concurrent.futures.ProcessPoolExecutor() as process:
        for url in range(1000):
            process.submit(reverse_geocode, 103, 3)

    time_end = time.time()
    print(f'\nfim {round(time_end - time_start, 2)} seconds')

Upvotes: 1

Related Questions