Reputation: 10784
I need to analyze a set of GPS coordinates in python. I need to find out what is the most frequent location. Given precision issues of the GPS data, the precision of the locations is not very high. Difficult to explan (and to search for infos on google), therefore an example:
If I run the script I need to analyse the coordinates where drives started and stopped, with a location radius precision of let's say 20m, I'll find out that the most frequent place is my home and my work (each with a radius of 20m). It does not matter where did I park within this radius.
Is there any library in python that can perform such operations? What do you recommend?
Thanks
Upvotes: 0
Views: 1607
Reputation: 28737
For counting most frequent locations, a simple approach is to use only the first 3 digits after the latitdue/longitude decimal point, or better round to 3 digits after comma.
At aequator:
4 digits: 11 m
3 digits 111m
2 digits 1.1km
1 digits 11.1km
0 digits 111.111 km (distance between two meridians): 40 000 000 / 360
Then you could use as hashtable, multiply with e,g 1000 to get rid of the 3 decimal points,
and store as java.awt.Point in the hashtable.
There are better solutions, but this gives an first idea.
Upvotes: 0
Reputation: 3483
If you're mostly interested in the places you go, you might consider from each drive taking the first and last points, and only take intermediate points if you're there for more than x time. Perhaps if your average speed at that point over the last k datapoints is less than some threshold. That should make it much easier to apply some clustering technique (like k-means clustering).
Something that may come in handy is using approximate nearest neighbors to find for any given point the collection of points that are relatively near it.
To take a page from graphics, you might even try superimposing a fine-resolution grid over the space of all data points, and for each point make a splat of a small radius onto this grid. Every time you add a splat, you can accumulate the time you spent at that point and then keep track as you go of the points in the grid with the most accumulated time.
Upvotes: 1