jkulak
jkulak

Reputation: 908

How to make query faster?

My tracks table with ~3mln records (grows 500/day) has around 30 columns but I only use 15 in WHERE clause. Query takes 4800ms on average with no other users/processes using database. How to make it faster? I would like to see results closer to 100ms.

People looking for songs (tracks) fill out a form :

99% of use cases is that SELECT query:

SELECT
    "public"."tracks"."sys_id",
    "public"."tracks"."all_artists",
    "public"."tracks"."name",
    "public"."tracks"."genres",
    "public"."tracks"."release_date",
    "public"."tracks"."tempo",
    "public"."tracks"."popularity",
    "public"."tracks"."danceability",
    "public"."tracks"."energy",
    "public"."tracks"."speechiness",
    "public"."tracks"."acousticness",
    "public"."tracks"."instrumentalness",
    "public"."tracks"."liveness",
    "public"."tracks"."valence",
    "public"."tracks"."main_artist_popularity",
    "public"."tracks"."main_artist_followers",
    "public"."tracks"."key",
    "public"."tracks"."preview_url"
FROM
    "public"."tracks"
WHERE
    (
    "public"."tracks"."name" LIKE '%oultec%'
    OR "public"."tracks"."all_artists_string" LIKE '%oultec%'
    )
    AND ("public"."tracks"."genres_string" LIKE '%rum%')
    AND "public"."tracks"."tempo" >= '80'
    AND "public"."tracks"."tempo" <= '210'
    AND "public"."tracks"."popularity" >= '0'
    AND "public"."tracks"."popularity" <= '100'
    AND "public"."tracks"."main_artist_popularity" >= '1'
    AND "public"."tracks"."main_artist_popularity" <= '100'
    AND "public"."tracks"."main_artist_followers" >= '1'
    AND "public"."tracks"."main_artist_followers" <= '50000000'
    AND "public"."tracks"."danceability" >= '0'
    AND "public"."tracks"."danceability" <= '1000'
    AND "public"."tracks"."energy" >= '0'
    AND "public"."tracks"."energy" <= '1000'
    AND "public"."tracks"."speechiness" >= '0'
    AND "public"."tracks"."speechiness" <= '1000'
    AND "public"."tracks"."acousticness" >= '0'
    AND "public"."tracks"."acousticness" <= '1000'
    AND "public"."tracks"."instrumentalness" >= '0'
    AND "public"."tracks"."instrumentalness" <= '1000'
    AND "public"."tracks"."liveness" >= '0'
    AND "public"."tracks"."liveness" <= '1000'
    AND "public"."tracks"."valence" >= '0'
    AND "public"."tracks"."valence" <= '1000'
    AND "public"."tracks"."release_date" >= '2020-01-01'
    AND "public"."tracks"."key" = '10'
ORDER BY
    "public"."tracks"."release_date" DESC,
    "public"."tracks"."popularity" DESC,
    "public"."tracks"."sys_id" ASC
LIMIT 5 OFFSET 0;

Indexes (indices):

PRIMARY sys_id
UNIQUE  main_artist, name, duration_ms
INDEX   energy
INDEX   tempo, popularity, main_artist_popularity, main_artist_followers, danceability, energy, speechiness, acousticness, instrumentalness, liveness, valence, name, all_artists_string, genres_string, release_date, key

EXPLAIN/ANALYZE :

Limit  (cost=308411.32..308411.33 rows=1 width=279) (actual time=4582.756..4584.010 rows=0 loops=1)
  ->  Sort  (cost=308411.32..308411.33 rows=1 width=279) (actual time=4546.306..4547.559 rows=0 loops=1)
        Sort Key: release_date DESC, popularity DESC, sys_id
        Sort Method: quicksort  Memory: 25kB
        ->  Gather  (cost=1000.00..308411.31 rows=1 width=279) (actual time=4546.121..4547.374 rows=0 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Parallel Seq Scan on tracks  (cost=0.00..307411.21 rows=1 width=279) (actual time=4473.923..4473.926 rows=0 loops=3)
                    Filter: ((genres_string ~~ '%rum%'::text) AND (tempo >= '80'::smallint) AND (tempo <= '210'::smallint) AND (popularity >= '0'::smallint) AND (popularity <= '100'::smallint) AND (main_artist_popularity >= '1'::smallint) AND (main_artist_popularity <= '100'::smallint) AND (main_artist_followers >= 1) AND (main_artist_followers <= 50000000) AND (danceability >= '0'::smallint) AND (danceability <= '1000'::smallint) AND (energy >= '0'::smallint) AND (energy <= '1000'::smallint) AND (speechiness >= '0'::smallint) AND (speechiness <= '1000'::smallint) AND (acousticness >= '0'::smallint) AND (acousticness <= '1000'::smallint) AND (instrumentalness >= '0'::smallint) AND (instrumentalness <= '1000'::smallint) AND (liveness >= '0'::smallint) AND (liveness <= '1000'::smallint) AND (valence >= '0'::smallint) AND (valence <= '1000'::smallint) AND (release_date >= '2020-01-01'::date) AND (key = '10'::smallint) AND ((name ~~ '%oultec%'::text) OR (all_artists_string ~~ '%oultec%'::text)))
                    Rows Removed by Filter: 1034268
Planning Time: 5.145 ms
JIT:
  Functions: 14
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 18.941 ms, Inlining 0.000 ms, Optimization 8.644 ms, Emission 110.864 ms, Total 138.449 ms
Execution Time: 4667.446 ms

PostgreSQL running from "official" image: postgres:14.1-alpine:

system htop

Table structure:

tracks table structure

Website that runs the queries (through an API/backend, more fields with min/max integers, but not shown here):

Form view for running the query

Upvotes: 2

Views: 91

Answers (1)

O. Jones
O. Jones

Reputation: 108651

Your query does LIKE '%something%' fulltext searches and range scans on dates and numbers. But BTREE indexes, the default, can handle just one range scan. And, they can't handle LIKE '%something%' at all. So, you are getting a full table scan for every query. 4.8s for three megarows isn't too bad considering.

For your column LIKE '%something%' searches you could try using trigram indexes, a feature of postgreSQL. This code will create trigram indexes on name. That might narrow down the choices and make it so you have to scan less data.

CREATE EXTENSION pg_trgm;  -- you may or may not need this statement.
CREATE INDEX CONCURRENTLY tracks_name
     ON tracks
  USING GIN (name gin_trgm_ops);
CREATE INDEX CONCURRENTLY tracks_all_artists_string
     ON tracks
  USING GIN (all_artists_string gin_trgm_ops);
CREATE INDEX CONCURRENTLY tracks_genres_string
     ON tracks
  USING GIN (genres_string gin_trgm_ops);

But you'll still have to scan through all the matching tracks.

If you create these indexes and then refactor the first bits of your WHERE clause to use set computations like so, you may (or may not) get even better performance.

WHERE sys_id IN (
     (SELECT sys_id FROM tracks WHERE name LIKE '%oultec%'
       UNION 
      SELECT sys_id FROM tracks WHERE all_artists_string LIKE '%oultec%'
     )
     INTERSECT
     SELECT sys_id FROM tracks WHERE genres_string LIKE '%oultec%'
    )
  AND tempo >= '80' ... 

But the fact remains that SQL isn't great for all these range scans.

Upvotes: 5

Related Questions