Reputation: 937
I have a MySQL dataset that consists of latitude, longitude, and a value. I am trying to sum up values whose latitude and longitude coordinates are within a given radii of other latitude and longitude coordinates (let us call them the "focal points"). Most tricky is that I am trying to separate out different coordinates from zones of overlap — e.g., where radius 1 overlaps with radius 2.
Each of focal points around which there are radii has multiple "zones" of radii, and so for any given set of lat/long coordinates there can be quite a lot of things to sum up. I have managed to put together a query that mostly works, though it is a bit slow:
Select
Sum(If(`zone`='z0_0x1_0',`value`,0)) as `z0_0x1_0`,
Sum(If(`zone`='z0_0x1_1',`value`,0)) as `z0_0x1_1`,
Sum(If(`zone`='z0_0x1_2',`value`,0)) as `z0_0x1_2`,
Sum(If(`zone`='z0_0x1_3',`value`,0)) as `z0_0x1_3`,
Sum(If(`zone`='z0_1x1_0',`value`,0)) as `z0_1x1_0`,
Sum(If(`zone`='z0_1x1_1',`value`,0)) as `z0_1x1_1`,
Sum(If(`zone`='z0_1x1_2',`value`,0)) as `z0_1x1_2`,
Sum(If(`zone`='z0_2x1_0',`value`,0)) as `z0_2x1_0`,
Sum(If(`zone`='z0_2x1_1',`value`,0)) as `z0_2x1_1`,
Sum(If(`zone`='z0_3x1_0',`value`,0)) as `z0_3x1_0`,
Sum(If(`zone`='z0_3x1_1',`value`,0)) as `z0_3x1_1`,
Sum(If(`zone`='z0_0',`value`,0)) as `z0_0`,
Sum(If(`zone`='z0_1',`value`,0)) as `z0_1`,
Sum(If(`zone`='z0_2',`value`,0)) as `z0_2`,
Sum(If(`zone`='z0_3',`value`,0)) as `z0_3`,
Sum(If(`zone`='z1_0',`value`,0)) as `z1_0`,
Sum(If(`zone`='z1_1',`value`,0)) as `z1_1`,
Sum(If(`zone`='z1_2',`value`,0)) as `z1_2`,
Sum(If(`zone`='z1_3',`value`,0)) as `z1_3`
From
(Select `lat`, `lng`, `value`,
Case
When ((`dist_0` Between 2.8723597844095 And 4.3343662110324) And (`dist_1` Between 3.6260179152491 And 5.4681062617155)) Then 'z0_0x1_0'
When ((`dist_0` Between 2.8723597844095 And 4.3343662110324) And (`dist_1` Between 2.1278369006061 And 3.6260179152491)) Then 'z0_0x1_1'
When ((`dist_0` Between 2.8723597844095 And 4.3343662110324) And (`dist_1` Between 1.3333495959677 And 2.1278369006061)) Then 'z0_0x1_2'
When ((`dist_0` Between 2.8723597844095 And 4.3343662110324) And (`dist_1` Between 0 And 1.3333495959677)) Then 'z0_0x1_3'
When ((`dist_0` Between 1.68658498678 And 2.8723597844095) And (`dist_1` Between 3.6260179152491 And 5.4681062617155)) Then 'z0_1x1_0'
When ((`dist_0` Between 1.68658498678 And 2.8723597844095) And (`dist_1` Between 2.1278369006061 And 3.6260179152491)) Then 'z0_1x1_1'
When ((`dist_0` Between 1.68658498678 And 2.8723597844095) And (`dist_1` Between 1.3333495959677 And 2.1278369006061)) Then 'z0_1x1_2'
When ((`dist_0` Between 1.0573158612197 And 1.68658498678) And (`dist_1` Between 3.6260179152491 And 5.4681062617155)) Then 'z0_2x1_0'
When ((`dist_0` Between 1.0573158612197 And 1.68658498678) And (`dist_1` Between 2.1278369006061 And 3.6260179152491)) Then 'z0_2x1_1'
When ((`dist_0` Between 0 And 1.0573158612197) And (`dist_1` Between 3.6260179152491 And 5.4681062617155)) Then 'z0_3x1_0'
When ((`dist_0` Between 0 And 1.0573158612197) And (`dist_1` Between 2.1278369006061 And 3.6260179152491)) Then 'z0_3x1_1'
When ((`dist_0` Between 2.8723597844095 And 4.3343662110324)) Then 'z0_0'
When ((`dist_0` Between 1.68658498678 And 2.8723597844095)) Then 'z0_1'
When ((`dist_0` Between 1.0573158612197 And 1.68658498678)) Then 'z0_2'
When ((`dist_0` Between 0 And 1.0573158612197)) Then 'z0_3'
When ((`dist_1` Between 3.6260179152491 And 5.4681062617155)) Then 'z1_0'
When ((`dist_1` Between 2.1278369006061 And 3.6260179152491)) Then 'z1_1'
When ((`dist_1` Between 1.3333495959677 And 2.1278369006061)) Then 'z1_2'
When ((`dist_1` Between 0 And 1.3333495959677)) Then 'z1_3'
End As `zone`
From
(Select `lat`, `lng`, `value`,
(acos(0.65292272498833*sin(radians(`lat`)) + 0.75742452772129*cos(radians(`lat`))*cos(radians(`lng`)-(-1.2910922519714))) * 6371) as `dist_0`,
(acos(0.65251345816785*sin(radians(`lat`)) + 0.75777713538338*cos(radians(`lat`))*cos(radians(`lng`)-(-1.2916315412569))) * 6371) as `dist_1`
From `pop`
Where
((`lat` Between 40.714353892125 And 40.810300107875) And (`lng` Between -74.037474145971 And -73.910799854029)) Or
((`lat` Between 40.673205922895 And 40.789544077105) And (`lng` Between -74.081798776797 And -73.928273223203))
)
As FirstCut
)
As Zonecut
Here's the logic of the thing:
First, it grabs bounding boxes around the maximum radii for each focal point. (This is the FirstCut query.) This reduces the number of data points we are looking at by several orders of magnitude.
Then it processes all of that data and gets each datapoint's distance from the focal points (in this case, dist_0
and dist_1
, but there can be an arbitrary number of focal points — I have used two in this example just to show how it works). This is the Haversine formula for great circle distances.
Then the Case statements kick in, assigning each coordinate a member of a "zone," which are processed from most complex to least complex. The zones codes just mean "zone X, radii Y" — so "z0_1" means "zone 0, radii 1". If there is an "x" it means that it is the intersection of multiple zones. This "zone code" is just assigned as a string.
Lastly, the zone codes are used to sum up everything, by assigning the zone names and then those Sum(If()) statements. (For whatever reason, If() seems to work marginally faster than Case() here.)
Which outputs to my script (PHP) a list of zones and sums. Now obviously this whole thing is procedurally generated, because you have to calculate all of the possible zones that are actually going to have "hits" in them ahead of time, and that is all done as pre-processing to avoid doing it in SQL.
Is there a more clever way to do this? The bit where I assign them a string, and then filter that string into fields... it looks hacky, not very elegant. But I couldn't find a better way to categorize them into fields in one big Case statement (which works much faster than many Case statements, it seems).
Any and all feedback on this would be appreciated. The MySQL tables are massive (many millions of rows) and indexed to all holy hell. Running the above query takes about 0.6 seconds, which isn't too bad, but as more focal points are added the queries start taking longer, and I'm just trying to think my way through the SQL logic at this stage. Thank you.
Upvotes: 0
Views: 176
Reputation: 142316
I did not check it thoroughly, but it seems like this could shorten that big CASE
some:
CONCAT(
( CASE
WHEN (dist_0 ... ) THEN 'z0_0'
WHEN (dist_0 ... ) THEN 'z0_1'
...
ELSE '' ),
( CASE
WHEN (dist_1 ... ) THEN 'z1_0'
WHEN (dist_1 ... ) THEN 'z1_1'
...
ELSE '' ) ) AS zone
Upvotes: 1