Reputation: 6208
I'm not very good in sql so I have some problems with sorting. I have table which contains geographical coordinates of some locations. I need to find 10 most closest location and show them. So for this I use such sql query :
SELECT
6373*2*ATAN2(
POWER(POWER(SIN(lat/2),2) + COS(f_lat)*COS(s_lat)*POWER(SIN(lon/2),2) , 0.5),
POWER(1-(POWER(SIN(lat/2),2) + COS(f_lat)*COS(s_lat)*POWER(SIN(lon/2),2)),0.5)) AS d
FROM
(SELECT
(f.loc_lat)*3.1415/180 AS f_lat, (f.loc_lon)*3.1415/180 AS f_lon,
(s.loc_lat)*3.1415/180 AS s_lat, (s.loc_lon)*3.1415/180 AS s_lon,
(s.loc_lat)*3.1415/180 - (f.loc_lat)*3.1415/180 AS lat,
(s.loc_lon)*3.1415/180 - (f.loc_lon)*3.1415/180 AS lon
FROM htl f, htl s
WHERE f.htl_cd != s.htl_cd
AND f.loc_lat is not null
AND f.loc_lon is not null
AND f.ctry_cd = s.ctry_cd
) location
)order by d asc;
If I run this query without last line everything is fine. But when I want to sort result I have problem. Sorting is processing to long for me (nearly 30 minutes).
count of query is nearly 28 millions of lines.
So may be I need to rewrite my query or may be you know some hints to speed it up.
UPDATE
data in table htl looks like this:
htl_cd loc_lat loc_lon
GDLUA 20.690688 -103.416281
LPLRL 53.463509 -2.876497
BDLWL 41.925992 -72.670218
MKCGM 38.892569 -94.523921
first column is simple primary key. second and third columns it is latitude and longitude (coordinates)
6373*2*ATAN2(
POWER(POWER(SIN(lat/2),2) + COS(f_lat)*COS(s_lat)*POWER(SIN(lon/2),2) , 0.5),
POWER(1-(POWER(SIN(lat/2),2) + COS(f_lat)*COS(s_lat)*POWER(SIN(lon/2),2)),0.5)) AS d
this formula I use for calculate distance between two locations
UPDATE2 for thous who need execution plan
Plan hash value: 446529610
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 12M| 733M| | 186K (2)| 00:37: 22 |
| 1 | SORT ORDER BY | | 12M| 733M| 1771M| 186K (2)| 00:37: 22 |
|* 2 | HASH JOIN | | 12M| 733M| | 259 (84)| 00:00: 04 |
|* 3 | TABLE ACCESS FULL| HTL | 5081 | 148K| | 22 (0)| 00:00: 01 |
| 4 | TABLE ACCESS FULL| HTL | 5411 | 158K| | 22 (0)| 00:00: 01 |
2 - access("F"."CTRY_CD"="S"."CTRY_CD")
filter("F"."HTL_CD"<>"S"."HTL_CD")
3 - filter("F"."LOC_LAT" IS NOT NULL)
537 recursive calls
18 db block gets
152 consistent gets
68534 physical reads
0 redo size
463630284 bytes sent via SQL*Net to client
9504847 bytes received via SQL*Net from client
864044 SQL*Net roundtrips to/from client
0 sorts (memory)
1 sorts (disk)
12960638 rows processed
Upvotes: 2
Views: 286
Reputation: 168232
Your query is returning each distance twice. If you have the closest rows a
and b
then it will return the distance for a
to b
and for b
to a
. You can eliminate half the calculations by just comparing the distances in one of the two permutations:
Oracle 11g R2 Schema Setup:
CREATE TABLE htl ( htl_cd, loc_lat, loc_lon, ctry_cd ) AS
SELECT 'GDLUA', 20.690688, -103.416281, 1 FROM DUAL
UNION ALL SELECT 'LPLRL', 53.463509, -2.876497, 1 FROM DUAL
UNION ALL SELECT 'BDLWL', 41.925992, -72.670218, 1 FROM DUAL
UNION ALL SELECT 'MKCGM', 38.892569, -94.523921, 1 FROM DUAL
/
BEGIN
FOR i IN 1 .. 20 LOOP
INSERT INTO htl VALUES(
'GD' || LPAD( TO_CHAR( i ), 3, '0' ),
20.690688 + i / 1000,
-103.416281 + i / 1000,
1
);
END LOOP;
END;
/
Query 1:
WITH indexed_locations AS (
SELECT h.*,
ROWNUM AS idx
FROM htl h
),
location AS (
SELECT
(f.loc_lat)*3.1415/180 AS f_lat, (f.loc_lon)*3.1415/180 AS f_lon,
(s.loc_lat)*3.1415/180 AS s_lat, (s.loc_lon)*3.1415/180 AS s_lon,
(s.loc_lat)*3.1415/180 - (f.loc_lat)*3.1415/180 AS lat,
(s.loc_lon)*3.1415/180 - (f.loc_lon)*3.1415/180 AS lon
FROM indexed_locations f
INNER JOIN
indexed_locations s
ON (
f.htl_cd != s.htl_cd
AND f.ctry_cd = s.ctry_cd
AND f.idx < s.idx
)
WHERE
f.loc_lat is not null
AND f.loc_lon is not null
),
distances AS (
SELECT
6373*2*ATAN2(
POWER(POWER(SIN(lat/2),2) + COS(f_lat)*COS(s_lat)*POWER(SIN(lon/2),2) , 0.5),
POWER(1-(POWER(SIN(lat/2),2) + COS(f_lat)*COS(s_lat)*POWER(SIN(lon/2),2)),0.5)) AS d
FROM location
ORDER BY d ASC
)
SELECT d
FROM distances
WHERE ROWNUM < 11
| D |
|----------------|
| 0.152300994984 |
| 0.152301463917 |
| 0.152301932829 |
| 0.152302401722 |
| 0.152302870595 |
| 0.152303339447 |
| 0.15230380828 |
| 0.152304277093 |
| 0.152304745885 |
| 0.152305214658 |
You can also move all the calculation into a user-defined function and make the function DETERMINISTIC
(which will help if the same input parameters are used) - however, beware when profiling this that when it is run once then the results may be cached and subsequent runs may use the cached results so you will not see the true cost of the query every time:
CREATE OR REPLACE FUNCTION calc_Lat_Long_Distance(
f_loc_lat HTL.LOC_LAT%TYPE,
f_loc_lon HTL.LOC_LON%TYPE,
s_loc_lat HTL.LOC_LAT%TYPE,
s_loc_lon HTL.LOC_LON%TYPE
) RETURN NUMBER DETERMINISTIC
AS
PI CONSTANT REAL := 3.1415/180;
f_lat REAL := f_loc_lat * PI;
f_lon REAL := f_loc_lon * PI;
s_lat REAL := s_loc_lat * PI;
s_lon REAL := s_loc_lon * PI;
lat_dif REAL := s_lat - f_lat;
lon_dif REAL := s_lon - f_lon;
p REAL := POWER(SIN(lat_dif/2),2) + COS(f_lat)*COS(s_lat)*POWER(SIN(lon_dif/2),2);
BEGIN
RETURN 6373*2*ATAN2( POWER(p, 0.5), POWER(1-p,0.5) );
END;
/
You can also use RESULT_CACHE
instead of DETERMINISTIC
- profile them and see whether one makes a particular difference.
Query 2:
WITH indexed_locations AS (
SELECT h.*,
ROWNUM AS idx
FROM htl h
),
distances AS (
SELECT calc_Lat_Long_Distance(
f.loc_lat,
f.loc_lon,
s.loc_lat,
s.loc_lon
) AS d
FROM indexed_locations f
INNER JOIN
indexed_locations s
ON (
f.htl_cd != s.htl_cd
AND f.ctry_cd = s.ctry_cd
AND f.idx < s.idx
)
WHERE
f.loc_lat is not null
AND f.loc_lon is not null
ORDER BY d ASC
)
SELECT d
FROM distances
WHERE ROWNUM < 11
| D |
|----------------|
| 0.152300994984 |
| 0.152301463917 |
| 0.152301932829 |
| 0.152302401722 |
| 0.152302870595 |
| 0.152303339447 |
| 0.15230380828 |
| 0.152304277093 |
| 0.152304745885 |
| 0.152305214658 |
Upvotes: 2
Reputation: 621
SELECT * FROM (
SELECT
6373*2*ATAN2(
POWER(POWER(SIN(lat/2),2) + COS(f_lat)*COS(s_lat)*POWER(SIN(lon/2),2) , 0.5),
POWER(1-(POWER(SIN(lat/2),2) + COS(f_lat)*COS(s_lat)*POWER(SIN(lon/2),2)),0.5)) AS d,
ROW_NUMBER() OVER (ORDER BY d asc) AS rownum
FROM
(SELECT
(f.loc_lat)*3.1415/180 AS f_lat, (f.loc_lon)*3.1415/180 AS f_lon,
(s.loc_lat)*3.1415/180 AS s_lat, (s.loc_lon)*3.1415/180 AS s_lon,
(s.loc_lat)*3.1415/180 - (f.loc_lat)*3.1415/180 AS lat,
(s.loc_lon)*3.1415/180 - (f.loc_lon)*3.1415/180 AS lon
FROM htl f, htl s
WHERE f.htl_cd != s.htl_cd
AND f.loc_lat is not null
AND f.loc_lat is not null
AND f.ctry_cd = s.ctry_cd
) location
)WHERE rownum <= 10;
If you need only top 10 records, try above.
Upvotes: 0
Reputation: 52386
If I run this query without last line everything is fine.
Do you mean that you get the first rows back from the query quickly, or that the entire set of 28 million rows is returned quickly? Your query is pretty computationally intensive.
But when I want to sort result I have problem. Sorting is processing to long for me (nearly 30 minutes).
A sort requires that all values are computed and that they are all ordered before any rows are returned. If all rows are returned from an unsorted query quickly, then it is the sort stage that is taking a long time, and the likely reason is that the sort is spilling to temporary disk. You can monitor the sort space required through the v$sql_workarea_active view while the query is running, and that will tell you how much memory you need to allocate in order to avoid the use of temporary disk space.
You can also get an accurate breakout of timings of each stage in the execution plan by using the gather_plan_statistics hint and DBMS_XPlan. http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/d_xplan.htm#ARPLS70137
Finally, do you actually need all 28 million rows to be sorted? Do you need to know which points are the furthest apart, or just the ones that are closest together? If the latter, perhaps you can consider filtering the result set prior to the ORDER BY.
Upvotes: 2