Reputation: 2645
I have a group of users. The user count could be 50 or could be 2000. Each should have a long/lat that I have retrieved from Google Geo api.
I need to query them all, and group them by proximity and a certain count. Say the count is 12 and I have 120 users in the group. I want to group people by how close they are (long/lat) to other people. So that I wind up with 10 groups of people who are close in proximity.
I currently have the google geo coding api setup and would prefer to use that.
TIA.
-- Update I have been googling about this for awhile and it appears that I am looking for a spatial query that returns groups by proximity.
Upvotes: 5
Views: 2025
Reputation: 542
Use GeoHash algorithm[1]. There is a PHP implementation[2]. You may pre-calculate geohashes with different precision, store them in SQL database alongside lat-lon values and query using native GROUP BY.
Upvotes: 3
Reputation: 3237
Keep in mind that this problem grows exponentially with every user you add, as the amount of distance calculations is linked to the square of the number of users (it's actually N*(N-1)
distances... so a 2000 user base would mean almost 4 million distance calculations on every pass. Just keep that in mind when sizing the resources you need
Are you looking to group them based on straight-line (actually great circle) distance or based on walking/driving distance?
If the former, the great circle distance can be approximated with simple math if you're able to tolerate a small margin of error and wish to assume the earth is a sphere. From GCMAP.com:
Earth's hypothetical shape is called the geoid and is approximated by an ellipsoid or an oblate sphereoid. A simpler model is to use a sphere, which is pretty close and makes the math MUCH easier. Assuming a sphere of radius 6371.2 km, convert longitude and latitude to radians (multiply by pi/180) and then use the following formula:
theta = lon2 - lon1
dist = acos(sin(lat1) × sin(lat2) + cos(lat1) × cos(lat2) × cos(theta))
if (dist < 0) dist = dist + pi
dist = dist × 6371.2
The resulting distance is in kilometers.
Now, if you need precise calculations and are willing to spend the CPU cycles needed for much complex math, you can use Vincenty's Formulae, which uses the WGS-84 reference ellipsoid model of the earth which is used for navigation, mapping and whatnot. More info HERE
As to the algorithm itself, you need to build a to-from matrix with the result of each calculation. Each row and column would represent each node. Two simplifications you may consider:
$dist[n][m] == $dist[m][n]
(no need to calculate the whole matrix, just half of it)$dist[m][m]
to an arbitrarily defined and abnormally large constant ($dist[m][m] = 22000 (miles)
for instance. Will work as long as all your users are on the planet)After making all the calculations, use an array sorting method to find the X closest nodes to each node and there you have it (you may or may not want to prevent a user being grouped on more than one group, but that's just business logic)
Actual code would be a little too much to provide at this time without seeing some of your progress first, but this is basically what you need to do algoritmically.
Upvotes: 6
Reputation: 1605
... it appears that I am looking for a spatial query that returns groups by proximity. ...
You could use hdbscan. Your groups are actually clusters in hdbscan wording. You would need to work with min_cluster_size and min_samples to get your groups right.
https://hdbscan.readthedocs.io/en/latest/parameter_selection.html
https://hdbscan.readthedocs.io/en/latest/
It appears that hdbscan runs under Python.
Here are two links on how to call Python from PHP: Calling Python in PHP, Running a Python script from PHP
Here is some more information on which clustering algorithm to choose: http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb
http://scikit-learn.org/stable/modules/clustering.html#clustering
Upvotes: 5