GuitarStrum
GuitarStrum

Reputation: 723

Restructuring/Optimizing: Avoiding table scan and Select statement is slower in plsql function

I have a PL/pgSQL function to check if a point is in a polygon. To start I want to do an AABB test on the min/max latitude and longitudes so I don't have to do the raycast. I'm doing the following inside the function in order to grab the minimums and maximums.

My problem is that each select max() / select min() statements take about 500ms to execute inside the function. If I do the same statements outside the function, the queries take about 20 ms each. Why are they so slow inside the function?

 select max(latitude) into maxLat from points where location=name_input;
 select max(longitude) into maxLong from points where location=name_input;
 select min(latitude) into minLat from points where location=name_input;
 select min(longitude) into minLong from points where location=name_input;

Here's the complete function. As you can guess from the code, I know very little SQL and I'm writing this for both postgresql and oracle (so some parts might just be a bad port, like having two arrays for lat/long instead of one array of points, which I did in oracle). I know that my call is really slow and the plan shows that it does a table scan even though I index the function and the columns on it. I was told in another question that it's impossible to index on my function because I pass in a string as a variable, so I'm trying to figure out how to fix it.

CREATE OR REPLACE FUNCTION GEOLOCATION_CONTAINS
(
name_input IN VARCHAR, --Name of the geofilter
lat_in IN DOUBLE PRECISION, --latitude of the point to test
long_in IN DOUBLE PRECISION --longitude of the point to test
) 
RETURNS INTEGER 
AS $$
DECLARE
  j int := 0; --index to previous point
  inside int := 0; -- If the point is inside or not
  numPoints int := 0; --Total number of points in the geo filter
  pointsLAT DOUBLE PRECISION[]; --An array of latitudes
  pointsLONG DOUBLE PRECISION[]; --An array of longitudes
  maxLat double precision := 0.0;
  maxLong double precision := 0.0;
  minLat double precision := 0.0;
  minLong double precision := 0.0;
BEGIN

  --Populate the array of points by grabbing all the points in a filter
  --The convention seems to be that order of a geo filter's points is defined by the order of their IDs, increasing
  pointsLAT := array(SELECT latitude FROM points where location=name_input ORDER BY ID);
  pointsLONG := array(SELECT longitude FROM points where location=name_input ORDER BY ID);

  --Get the max/min lat/long to return before raycasting
  select max(latitude) into maxLat from points where location=name_input;
  select max(longitude) into maxLong from points where location=name_input;
  select min(latitude) into minLat from points where location=name_input;
  select min(longitude) into minLong from points where location=name_input;

  --Check if it's even possible to be in the filter. If it's outside the bounds, return 0 for outside.
  IF lat_in <= minLat OR lat_in >= maxLat OR long_in <= minLong OR long_in >= maxLong THEN
    return 0;
  END IF;

  --Get the total number of points in the points array
  SELECT COUNT(*) into numPoints from points where location=name_input;

  --Init the pointer to the prev point index to the last guy in the array
  j := numPoints;

  --Perform raycast intersection test over the polgygon
  for i IN 1..numPoints loop
      --Test for intersection on an edge of the polygon
      if((pointsLAT[i]>lat_in) != (pointsLAT[j]>lat_in)) then
        if (long_in < (pointsLONG[j]-pointsLONG[i]) * (lat_in-pointsLAT[i]) / (pointsLAT[j]-pointsLAT[i]) + pointsLONG[i]) then
          --Intersected a line, toggle in/out
          if(inside = 0) then
            inside := 1;
          else
            inside := 0;

          end if;
        end if;
      end if;    

  --set J to previous before incrementing i
  j := i;
  end loop;

  RETURN inside;
END; $$ LANGUAGE plpgsql IMMUTABLE;

I'm looking at finding a way to get a function index to work, because it's just too slow if I run it on a table with 200,000+ rows (about 40 seconds now with the optimizations provided so far in the answers). To compare, doing a select * of all the objects and running it through Java's Polygon class takes 2 seconds, so obviously I'm doing something wrong in my plsql implementation. I'm currently reading tutorials and I see things like inline functions and views to speed things up, but I'm not exactly sure what kind of thing to read into in order to make it faster.

Upvotes: 0

Views: 83

Answers (2)

user330315
user330315

Reputation:

You can reduce all seven SQL statements into a single one:

select max(latitude), 
       max(longitude), 
       min(latitude), 
       min(longitude), 
       array_agg(latitude ORDER BY ID), 
       array_agg(longitude ORDER BY ID),
       COUNT(*) over ()
   into into maxLat, maxlong, minLat, minLong, pointsLAT, pointsLONG, numPoints 
from points 
where location = name_input;

I have no experience with GIS processing so I might be completely wrong with the following:

It seems that you are storing the polygon as multiple rows in the table. However in Postgres and even more so with the PostGIS extension you can store polygons in a single column and then you have native operators that can check if a point is inside the polygon. Queries using those operators can make use of GiST or GIN indexes.

My understanding is, that for any serious GIS work you should definitely look into PostGIS. The built-in geometric data types in Postgres only offer a very basic set of features.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1270763

Why have four statements?

select max(latitude), max(longitude), min(latitude), min(longitude)
into maxLat, maxlong, minLat, minLong
from points
where location = name_input;

This doesn't address why the call seems faster outside the function rather than inside. But there is other overhead to calling a function.

Upvotes: 2

Related Questions