user139014
user139014

Reputation: 1495

performance - finding all points within certain distance by lat/long

I have a CSV file with points tagged by lat/long (~10K points). I'd like to search for all points within a given distance of a user/specified lat/long coordinate — say, for example, the centroid of Manhattan.

I'm pretty new to programming and databases, so this may be a basic question. If so, I apologize. Is it performant to do this search in pure Python without using a database? As in, could I simply read the CSV into memory and do the search with a Python script? If it is performant, would it scale well as the number of points increases?

Or is this simply infeasible in Python, and I need to investigate using a database that supports geospatial queries?

Additionally, how do I go about understanding the performance of these types of calculations so that I can develop a good intuition for this?

Upvotes: 1

Views: 1125

Answers (1)

Tom Bennett
Tom Bennett

Reputation: 2496

This is definitely possible in python without any databases. I would definitely recommend using numpy. I would do the following:

  1. read all points from csv into a numpy array
  2. Calculate the distance of each point to your given point
  3. Sort the distance or simply find the one with minimum distance using argmin

Because all calculations are vectorized, they happen at close to C speed.

With an okay computer, the I/O will take like 2-3 seconds and the calculation will take less than 100-200 milliseconds.

In terms of math, you can try http://en.wikipedia.org/wiki/Haversine_formula

Upvotes: 1

Related Questions