Reputation: 1495
I have a CSV file with points tagged by lat/long (~10K points). I'd like to search for all points within a given distance of a user/specified lat/long coordinate — say, for example, the centroid of Manhattan.
I'm pretty new to programming and databases, so this may be a basic question. If so, I apologize. Is it performant to do this search in pure Python without using a database? As in, could I simply read the CSV into memory and do the search with a Python script? If it is performant, would it scale well as the number of points increases?
Or is this simply infeasible in Python, and I need to investigate using a database that supports geospatial queries?
Additionally, how do I go about understanding the performance of these types of calculations so that I can develop a good intuition for this?
Upvotes: 1
Views: 1125
Reputation: 2496
This is definitely possible in python without any databases. I would definitely recommend using numpy. I would do the following:
Because all calculations are vectorized, they happen at close to C speed.
With an okay computer, the I/O will take like 2-3 seconds and the calculation will take less than 100-200 milliseconds.
In terms of math, you can try http://en.wikipedia.org/wiki/Haversine_formula
Upvotes: 1