Niels Kristian
Niels Kristian

Reputation: 8835

How can I create an index on lat/lon col using GIST in combination with a boolean col?

I have table like this:

CREATE TABLE products (
  id INT,
  latitude numeric(9,6),
  longitude numeric(9,6),
  is_location_independent boolean
);

INSERT INTO products (id, latitude, longitude, is_location_independent) VALUES (1, 56.1508469,10.2128301,false);
INSERT INTO products (id, latitude, longitude, is_location_independent) VALUES (2,56.1508469,15.2128301,true);

I wanna create an index that efficiently covers the following query:

SELECT COUNT(*) FROM products
  WHERE 
    (ST_DWithin( ST_GeographyFromText( 'SRID=4326;POINT(' || longitude || ' ' || latitude || ')' ), ST_GeographyFromText('SRID=4326;POINT(13.621700 45.940900)'), 80000 ) OR is_location_independent IS TRUE)

I tried to create:

CREATE INDEX CONCURRENTLY index_products_location
    ON products USING gist
    (st_geographyfromtext(((('SRID=4326;POINT('::text || longitude) || ' '::text) || latitude) || ')'::text), is_location_independent)
    TABLESPACE pg_default;

But I get the following exception:

ERROR: data type boolean has no default operator class for access method "gist"

Now having the index without the combination with is_location_independent is not gonna work well. Without it, the query planner decides to do a seq scan on the table and it has millions of rows, so it's slow (8 sec).

What can I do?

PS: Postgresql 11

Update

Having added the following two indexes:

CREATE INDEX ON products USING gist (
   st_geographyfromtext(
      'SRID=4326;POINT(' || longitude || ' ' || latitude || ')'
   )
) WHERE is_location_independent;

and

CREATE INDEX ON products USING btree (
   is_location_independent
);

And having data like: is_location_independent: true => 1493 records and is_location_independent: false => 1 359 200

Doing the query, the query planner actually doesn't use the GEO-index, which is the one that would be really useful to the query...

"Finalize Aggregate  (cost=262104.14..262104.15 rows=1 width=8) (actual time=2297.030..2297.030 rows=1 loops=1)"
"  ->  Gather  (cost=262103.72..262104.13 rows=4 width=8) (actual time=2292.063..2667.239 rows=5 loops=1)"
"        Workers Planned: 4"
"        Workers Launched: 4"
"        ->  Partial Aggregate  (cost=261103.72..261103.73 rows=1 width=8) (actual time=2273.565..2273.566 rows=1 loops=5)"
"              ->  Parallel Index Scan using index_products_on_is_location_independent on products  (cost=0.43..261093.34 rows=4152 width=0) (actual time=12.878..2272.703 rows=4461 loops=5)"
"                    Filter: (((st_geographyfromtext((((('SRID=4326;POINT('::text || (longitude)::text) || ' '::text) || (latitude)::text) || ')'::text)) && '0101000020E6100000AED85F764F3E2B40386744696FF84640'::geography) AND ('0101000020E6100000AED85F764F3E2B40386744696FF84640'::geography && _st_expand(st_geographyfromtext((((('SRID=4326;POINT('::text || (longitude)::text) || ' '::text) || (latitude)::text) || ')'::text)), '80000'::double precision)) AND _st_dwithin(st_geographyfromtext((((('SRID=4326;POINT('::text || (longitude)::text) || ' '::text) || (latitude)::text) || ')'::text)), '0101000020E6100000AED85F764F3E2B40386744696FF84640'::geography, '80000'::double precision, true)) OR (is_location_independent IS TRUE))"
"                    Rows Removed by Filter: 257706"

So as you can see, indexing is not working efficiently...

Upvotes: 2

Views: 513

Answers (2)

jjanes
jjanes

Reputation: 44147

CREATE INDEX ON products USING gist (
   st_geographyfromtext(
      'SRID=4326;POINT(' || longitude || ' ' || latitude || ')'
   )
) WHERE is_location_independent;

Isn't the WHERE clause here backwards? What is the point of indexing the locations for the points where locations don't matter?

If I create the index with the negated WHERE, then with a small change your query I can get it to use both indexes with a BitmapOr.

explain SELECT COUNT(*) FROM products WHERE 
(
    ST_DWithin( ST_GeographyFromText( 'SRID=4326;POINT(' || longitude || ' ' || latitude || ')' ), ST_GeographyFromText('SRID=4326;POINT(13.621700 45.940900)'), 800000 )
    AND 
    not is_location_independent
) 
OR is_location_independent;

Gives plan

Aggregate  (cost=63.09..63.10 rows=1 width=8)
   ->  Bitmap Heap Scan on products  (cost=8.53..63.09 rows=2 width=0)
         Recheck Cond: ((st_dwithin(st_geographyfromtext((((('SRID=4326;POINT('::text || (longitude)::text) || ' '::text) || (latitude)::text) || ')'::text)), '0101000020E6100000AED85F764F3E2B40386744696FF84640'::geography, '800000'::double precision, true) AND (NOT is_location_independent)) OR is_location_independent)
         Filter: ((st_dwithin(st_geographyfromtext((((('SRID=4326;POINT('::text || (longitude)::text) || ' '::text) || (latitude)::text) || ')'::text)), '0101000020E6100000AED85F764F3E2B40386744696FF84640'::geography, '800000'::double precision, true) AND (NOT is_location_independent)) OR is_location_independent)
         ->  BitmapOr  (cost=8.53..8.53 rows=2 width=0)
               ->  Bitmap Index Scan on products_st_geographyfromtext_idx1  (cost=0.00..4.38 rows=1 width=0)
                     Index Cond: (st_geographyfromtext((((('SRID=4326;POINT('::text || (longitude)::text) || ' '::text) || (latitude)::text) || ')'::text)) && _st_expand('0101000020E6100000AED85F764F3E2B40386744696FF84640'::geography, '800000'::double precision))
               ->  Bitmap Index Scan on products_is_location_independent_idx  (cost=0.00..4.14 rows=2 width=0)
                     Index Cond: (is_location_independent = true)

Without the manual re-writing of the query to include the explicit "not is_location_independent" ANDed to the geography function, the planner can't figure out that it can use the index.

On the other hand, you could make the index with no WHERE clause, and then it would work whether you re-write the query as I suggest or not. Since the vast majority of your records are location-dependent, there is not much efficiency to be gained by making the partial index.

I've created the 3 indexes below. The first 2 are used on your original query, the 1st and 3rd are used on my rewrite of it. I also made a version of the first index which was filtered for only WHERE is_location_indepedent, and that one would be used in preference to the original.

"products_is_location_independent_idx" btree (is_location_independent)
"products_st_geographyfromtext_idx" gist (st_geographyfromtext(((('SRID=4326;POINT('::text || longitude) || ' '::text) || latitude) || ')'::text))
"products_st_geographyfromtext_idx1" gist (st_geographyfromtext(((('SRID=4326;POINT('::text || longitude) || ' '::text) || latitude) || ')'::text)) WHERE NOT is_location_independent

Upvotes: 2

Laurenz Albe
Laurenz Albe

Reputation: 246403

It makes no sense to include is_location_independent in the index, since you have got an OR in the WHERE condition.

There are two possibilities:

  • If many rows have is_location_independent set to TRUE, PostgreSQL will always use a sequential scan, because that is fastest.

  • If only few rows have is_location_independent set to TRUE, create a second index: a regular B-tree index on is_location_independent alone.

    Then you can get a bitmap index scan and a bitmap or to speed up the query.

Upvotes: 2

Related Questions