Reputation: 271904
Suppose I have a people and their GPS coordinates:
User1, 52.99, -41.0
User2, 91.44, -21.4
User3, 5.12, 24.5
...
My objective is: Given a set of coordinates,
As you probably guessed, these coordinates will be retrieved from a mobile phone. The phones will update their longitude/latitude every 10 seconds, as well as get that list of users <20 meters. It's dynamic.
I would like the best way to do this so that it can scale.
By the way, there is already a formula that can calculate the distance between 2 coordinates http://www.johndcook.com/python_longitude_latitude.html. I just need to know what's the best way to do this technically (Trees, Database? What architecture? More specifically...how would you tie in the long/lat distance formula into the "SELECT" statement?)
Upvotes: 2
Views: 503
Reputation: 425431
Create a MyISAM
table with a column of datatype Point
Create a SPATIAL
index on this column
Convert the GPS
coords into UTM
(grid) coords and store them in your table
Issue this query:
SELECT user_id, GLength(LineString(user_point, @mypoint))
FROM users
WHERE MBRWithin(user_point, LineString(Point(X(@mypoint) - 20, Y(@mypoint - 20)), Point(X(@mypoint) + 20, Y(@mypoint + 20))
AND GLength(LineString(user_point, @mypoint)) <= 20
Note that this query will most probably be run on very volatile data and you will need to do the additional checks on time.
Since MySQL
cannot combine SPATIAL
indexes, it will be better to use some kind of surface tiling technology:
Split the Earth surface into a number of tiles, say, 1 x 1 "
(it's about 30
meters of the meridian and 30 * COS(lon)
of the parallel.
Store the data in the CHAR(14)
column: 7
digits of the lat
+ 7
digits on the lon
(14 digits at all). Disable key compression on this column.
Create a composite index on (time, tile)
On the client, calculate all possible tiles your mates may be in. For 20
meters distance, this will be at most 9
tiles, unless you are deep at North or South. However, you may change the tiling algorithm to handle these cases.
Issue this query:
SELECT *
FROM (
SELECT tile1
UNION ALL
SELECT tile2
UNION ALL
…
) tiles
JOIN users u
ON u.tile = tiles.tile
AND u.time >= NOW()
AND GLength(LineString(user_point, @mypoint)) <= 20
, where tile1
etc are precalculated tiles.
SQL Server
implements this algorithm for its spatial indexes (rather than R-Tree
that MySQL
uses).
Upvotes: 3
Reputation: 38603
Well, the naive approach would be to do an O(n) pass over all points, get their distance from the current point, and find the top 20. This is perfectly Ok for small datasets (say <= 500 points), but on larger sets it's going to be quite slow. In SQL, this would be along the lines of:
SELECT point_id, DIST_FORMULA(x, y) as distance
FROM points
WHERE distance < 20
To address the inefficiency of the above method, you would have to use some sort of preprocessing step, most likely space partitioning. That can often dramatically improve performance in nearest neighbour type of searches like this. However, in your case, if all the points are updated every 10 seconds, you would have to do an Ω(n) pass to update the position of each point in the space partitioning tree. If you have more than a few queries between each update, it will be useful, otherwise it'll simply be an overhead.
Upvotes: 2
Reputation: 3118
Chapter 11 "Database Design Know it All" has some thoughts on how to design such a database.
Upvotes: 0