Efficient way to represent locations, and query based on proximity?
I'm pondering over how to efficiently represent locations in a database, such that given an arbitrary new location, I can efficiently query the database for candidate locations that are within an acceptable proximity threshold to the subject.
Similar things have been asked before, but I haven't found a discussion based on my criteria for the problem domain.
Things to bear in mind:
- Starting from scratch, I can represent data in any way (eg. long&lat, etc)
- Any result set is time-sensitive, in that it loses validity within a short window of time (~5-15mins) so I can't cache indefinitely
- I can tolerate some reasonable margin of error in results, for example if a location is slightly outside of the threshold, or if a row in the result set has very recently expired
- A language agnostic discussion is perfect, but in case it helps I'm using C# MVC 3 and SQL Server 2012
A couple of first thoughts:
- Use an external API like Google, however this will generate thousands of requests and the latency will be poor
- Use the Haversine function, however this looks expensive and so should be performed on a minimal number of candidates (possibly as a Stored Procedure even!)
- Build a graph of postcodes/zipcodes, such that from any node I can find postcodes/zipcodes that border it, however this could involve a lot of data to store
Some optimization ideas to reduce possible candidates quickly:
- Cache result sets for searches, and when we do subsequent searches, see if the subject is within an acceptable range to a candidate we already have a cached result set for. If so, use the cached result set (but remember, the results expire quickly)
I'm hoping the answer isn't just raw CPU power, and that there are some approaches I haven't thought of that could help me out?
Thank you
ps. Apologies if I've missed previously asked questions with helpful answers, please let me know below.