Suppose I have a people and their GPS coordinates: User1, 52.99, -41.0 User2, 91.44, -21.4 User3, 5.12, 24.5 ... My objective is: Given a set of coordinates, Out of all those users, find the ones within 20 meters. (how to do a SELECT statement like this?) For each of those users, get the distance. As you probably guessed, these coordinates will be retrieved from a mobile phone. The phones will update their longitude/latitude every 10 seconds, as well as get that list of users <20 meters. It's dynamic. I would like the best way to do this so that it can scale. Would you store the coordinates in a database, and update it every 10 seconds? (If you store it in a database...how would you calculate it...) How would you do this so it can scale? By the way, there is already a formula that can calculate the distance between 2 coordinates http://www.johndcook.com/python_longitude_latitude.html . I just need to know what's the best way to do this technically (Trees, Database? What architecture? More specifically...how would you tie in the long/lat distance formula into the "SELECT" statement?)

Reputation: 271904

How would you solve this GPS/location problem and scale it? Would you use a Database? R-tree?

Suppose I have a people and their GPS coordinates:

User1, 52.99, -41.0
User2, 91.44, -21.4
User3, 5.12, 24.5
...

My objective is: Given a set of coordinates,

Out of all those users, find the ones within 20 meters. (how to do a SELECT statement like this?)
For each of those users, get the distance.

As you probably guessed, these coordinates will be retrieved from a mobile phone. The phones will update their longitude/latitude every 10 seconds, as well as get that list of users <20 meters. It's dynamic.

I would like the best way to do this so that it can scale.

Would you store the coordinates in a database, and update it every 10 seconds? (If you store it in a database...how would you calculate it...)
How would you do this so it can scale?

By the way, there is already a formula that can calculate the distance between 2 coordinates http://www.johndcook.com/python_longitude_latitude.html. I just need to know what's the best way to do this technically (Trees, Database? What architecture? More specifically...how would you tie in the long/lat distance formula into the "SELECT" statement?)

Upvotes: 2

Answers (3)

Quassnoi

Reputation: 425431

Create a MyISAM table with a column of datatype Point
Create a SPATIAL index on this column
Convert the GPS coords into UTM (grid) coords and store them in your table

Issue this query:

SELECT  user_id, GLength(LineString(user_point, @mypoint))
FROM    users
WHERE   MBRWithin(user_point, LineString(Point(X(@mypoint) - 20, Y(@mypoint - 20)), Point(X(@mypoint) + 20, Y(@mypoint + 20))
        AND GLength(LineString(user_point, @mypoint)) <= 20

Note that this query will most probably be run on very volatile data and you will need to do the additional checks on time.

Since MySQL cannot combine SPATIAL indexes, it will be better to use some kind of surface tiling technology:

Split the Earth surface into a number of tiles, say, 1 x 1 " (it's about 30 meters of the meridian and 30 * COS(lon) of the parallel.
Store the data in the CHAR(14) column: 7 digits of the lat + 7 digits on the lon (14 digits at all). Disable key compression on this column.
Create a composite index on (time, tile)
On the client, calculate all possible tiles your mates may be in. For 20 meters distance, this will be at most 9 tiles, unless you are deep at North or South. However, you may change the tiling algorithm to handle these cases.

Issue this query:

SELECT  *
FROM    (
        SELECT  tile1
        UNION ALL
        SELECT  tile2
        UNION ALL
        …
        ) tiles
JOIN    users u
ON      u.tile  = tiles.tile
        AND u.time >= NOW() 
        AND GLength(LineString(user_point, @mypoint)) <= 20

, where tile1 etc are precalculated tiles.

SQL Server implements this algorithm for its spatial indexes (rather than R-Tree that MySQL uses).

Upvotes: 3

Max Shawabkeh

Reputation: 38603

Well, the naive approach would be to do an O(n) pass over all points, get their distance from the current point, and find the top 20. This is perfectly Ok for small datasets (say <= 500 points), but on larger sets it's going to be quite slow. In SQL, this would be along the lines of:

SELECT point_id, DIST_FORMULA(x, y) as distance
FROM   points
WHERE  distance < 20

To address the inefficiency of the above method, you would have to use some sort of preprocessing step, most likely space partitioning. That can often dramatically improve performance in nearest neighbour type of searches like this. However, in your case, if all the points are updated every 10 seconds, you would have to do an Ω(n) pass to update the position of each point in the space partitioning tree. If you have more than a few queries between each update, it will be useful, otherwise it'll simply be an overhead.

Upvotes: 2

Anonym

Reputation: 3118

Chapter 11 "Database Design Know it All" has some thoughts on how to design such a database.

Upvotes: 0

How would you solve this GPS/location problem and scale it? Would you use a Database? R-tree?

Answers (3)

Related Questions