Mattis
Mattis

Reputation: 5096

Can't get this PostgreSQL query to work any better

I'm running PostgreSQL 9.6 with PostGIS 2.3.3

I'm trying to make this rather critical query go faster (looking for users within 1000 meters of a location), but I'm having troubles setting up the right indexes.

Could someone point me in the right direction?

users are 200k rows, locations are 1200 rows, user_push_tokens are 155k rows

users table:

create table users
(
  id serial not null
    constraint users_pkey
      primary key,
  (an additional 20-ish columns),
  geo_point geometry(Point,4326)
);`

create unique index users_id
  on users (id);

create index users_geo_point_idx
  on users using gist(geo_point);

user_push_tokens table:

create table user_push_tokens
(
  user_id integer not null
    constraint push_tokens_user_id_fkey
      references users
        on delete cascade,
  push_token varchar(255) not null,
  push_provider varchar(64) not null,
  app varchar(64) default 'app'::character varying not null,
  id integer default nextval('user_push_tokens_id_seq'::regclass) not null
    constraint user_push_tokens_pkey
      primary key,
  active boolean default true not null,
);

create index trinity_unique_index
  on user_push_tokens (user_id, app, push_token);

create index user_push_tokens_token_fetch_idx
  on user_push_tokens (user_id, app, active);

create index user_id_index
  on user_push_tokens (user_id);

locations table:

create table locations
(
  id integer default nextval('locations_id_seq'::regclass) not null
    constraint locations_pkey
      primary key,
  (another 25 columns),
  geo_point geometry(Point,4326)
);

create unique index locations_id_key
  on locations (id);

create index locations_geo_point_idx
  on locations using gist(geo_point);

The query

EXPLAIN ANALYSE SELECT
    u.id AS user_id,
    upt.push_provider,
    upt.push_token
  FROM users u
    JOIN locations l ON l.id = 3896
    JOIN user_push_tokens upt
      ON upt.user_id = u.id AND upt.active = true AND upt.app = 'app'
  WHERE ST_DistanceSphere(u.geo_point, l.geo_point) <= 1000;

results in

Nested Loop  (cost=26087.06..63658.87 rows=30605 width=107) (actual time=353.304..887.371 rows=2498 loops=1)
  Join Filter: (_st_distance(geography(u.geo_point), geography(l.geo_point), '0'::double precision, false) <= '1000'::double precision)
  Rows Removed by Join Filter: 89539
  ->  Index Scan using locations_id_key on locations l  (cost=0.28..8.29 rows=1 width=32) (actual time=0.009..0.014 rows=1 loops=1)
        Index Cond: (id = 3896)
  ->  Hash Join  (cost=26086.78..39090.07 rows=91815 width=139) (actual time=352.437..657.228 rows=92037 loops=1)
        Hash Cond: (upt.user_id = u.id)
        ->  Seq Scan on user_push_tokens upt  (cost=0.00..7162.82 rows=91815 width=107) (actual time=0.032..103.512 rows=92037 loops=1)
              Filter: (active AND ((app)::text = 'app'::text))
              Rows Removed by Filter: 62437
        ->  Hash  (cost=22114.46..22114.46 rows=195546 width=36) (actual time=352.199..352.199 rows=195589 loops=1)
              Buckets: 65536  Batches: 4  Memory Usage: 3563kB
              ->  Seq Scan on users u  (cost=0.00..22114.46 rows=195546 width=36) (actual time=0.014..214.976 rows=195589 loops=1)

More readable explain output

Thank you for your time.

Upvotes: 2

Views: 63

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246033

An index like this will make it slightly faster:

CREATE INDEX ON user_push_tokens (app) WHERE active;

But there is no way to speed up the nested loop with the function call.

One idea to investigate is to add a second “looser” distance check that is indexable, may have false positives, but filters out most of the rows.

Not sure if that is possible, but worth thinking about.

Upvotes: 1

Related Questions