Reputation: 3
I am trying to scale the distance value returned from the 'nearest_edges' function(from the OSMNX library) on a huge dataset using the lat and long columns as my inputs to the creation of my mutlidigraph. It takes forever to run and sometimes returns null. Is there any other solution? I created a user defined function (code below) so I can apply that function to the dataset using the long lat columns of that dataset.
My code below:
import osmnx as ox
@udf(returnType=T.DoubleType())
def get_distance_to_road (lat_dd=None,long_dd=None,dist_bbox=None):
try:
location = (lat_dd,long_dd)
G = ox.graph_from_point(
center_point=location,
dist=dist_bbox, #meter
simplify=True,
retain_all=True,
truncate_by_edge=True,
network_type='all'
)
Gp = ox.project_graph(G)
point_geom_proj, crs = ox.projection.project_geometry(Point(reversed(location)), to_crs=Gp.graph['crs'])
distance = np.round(ox.nearest_edges(Gp, point_geom_proj.x, point_geom_proj.y, return_dist=True)[1],2).item()
except Exception:
distance = None
return distance #meter
Upvotes: 0
Views: 86
Reputation: 6442
The nearest_edges
function is fast and scalable. Rather, your problem here is everything else you're doing each time you call nearest_edges
.
First off, you always want to run it vectorized rather than in a loop. That is, if you have many points to snap to their nearest edges, pass them all at once as numpy arrays to the nearest_edges
function for vectorized, spatial indexed look-up:
import osmnx as ox
# get projected graph and randomly sample some points to find nearest edges to
G = ox.graph.graph_from_place("Piedmont, CA, USA", network_type="drive")
Gp = ox.projection.project_graph(G)
points = ox.utils_geo.sample_points(ox.convert.to_undirected(Gp), n=1000000)
%%time
ne, dist = ox.distance.nearest_edges(Gp, X=points.x, Y=points.y, return_dist=True)
# wall time = 8.3 seconds
Here, the nearest_edges
search matched 1 million points to their nearest edges in about 8 seconds. If you instead put this all into a loop (which with each iteration builds a graph, projects the graph and point, then finds the nearest edge to that one point), matching these million points will take approximately forever. This isn't because nearest_edges
is slow... it's because everything else in the loop is (relatively) slow.
Your basic options are:
Upvotes: 0
Reputation: 351
Your example does not give me code to try out myself, but in general I have noticed that OSMnx is not suited for large amounts of data. Especially, nearest_edges
uses a lot of CPU and RAM to build an index and then query on that index. However, nearest_edges
should work and is optimized for speed when querying many points. I would try the following things:
Only use as much data in the beginning as you absolutely need in order to test your functionality. Then, if it works, just let it run for the time it needs.
Run your code with cprofile or similar in order to see which part of OSMnx is really making it slow and go from there.
Upvotes: 0