Reputation: 908
My tracks
table with ~3mln records (grows 500/day) has around 30 columns but I only use 15 in WHERE
clause. Query takes 4800ms on average with no other users/processes using database. How to make it faster? I would like to see results closer to 100ms.
People looking for songs (tracks) fill out a form :
99% of use cases is that SELECT query:
SELECT
"public"."tracks"."sys_id",
"public"."tracks"."all_artists",
"public"."tracks"."name",
"public"."tracks"."genres",
"public"."tracks"."release_date",
"public"."tracks"."tempo",
"public"."tracks"."popularity",
"public"."tracks"."danceability",
"public"."tracks"."energy",
"public"."tracks"."speechiness",
"public"."tracks"."acousticness",
"public"."tracks"."instrumentalness",
"public"."tracks"."liveness",
"public"."tracks"."valence",
"public"."tracks"."main_artist_popularity",
"public"."tracks"."main_artist_followers",
"public"."tracks"."key",
"public"."tracks"."preview_url"
FROM
"public"."tracks"
WHERE
(
"public"."tracks"."name" LIKE '%oultec%'
OR "public"."tracks"."all_artists_string" LIKE '%oultec%'
)
AND ("public"."tracks"."genres_string" LIKE '%rum%')
AND "public"."tracks"."tempo" >= '80'
AND "public"."tracks"."tempo" <= '210'
AND "public"."tracks"."popularity" >= '0'
AND "public"."tracks"."popularity" <= '100'
AND "public"."tracks"."main_artist_popularity" >= '1'
AND "public"."tracks"."main_artist_popularity" <= '100'
AND "public"."tracks"."main_artist_followers" >= '1'
AND "public"."tracks"."main_artist_followers" <= '50000000'
AND "public"."tracks"."danceability" >= '0'
AND "public"."tracks"."danceability" <= '1000'
AND "public"."tracks"."energy" >= '0'
AND "public"."tracks"."energy" <= '1000'
AND "public"."tracks"."speechiness" >= '0'
AND "public"."tracks"."speechiness" <= '1000'
AND "public"."tracks"."acousticness" >= '0'
AND "public"."tracks"."acousticness" <= '1000'
AND "public"."tracks"."instrumentalness" >= '0'
AND "public"."tracks"."instrumentalness" <= '1000'
AND "public"."tracks"."liveness" >= '0'
AND "public"."tracks"."liveness" <= '1000'
AND "public"."tracks"."valence" >= '0'
AND "public"."tracks"."valence" <= '1000'
AND "public"."tracks"."release_date" >= '2020-01-01'
AND "public"."tracks"."key" = '10'
ORDER BY
"public"."tracks"."release_date" DESC,
"public"."tracks"."popularity" DESC,
"public"."tracks"."sys_id" ASC
LIMIT 5 OFFSET 0;
Indexes (indices):
PRIMARY sys_id
UNIQUE main_artist, name, duration_ms
INDEX energy
INDEX tempo, popularity, main_artist_popularity, main_artist_followers, danceability, energy, speechiness, acousticness, instrumentalness, liveness, valence, name, all_artists_string, genres_string, release_date, key
EXPLAIN
/ANALYZE
:
Limit (cost=308411.32..308411.33 rows=1 width=279) (actual time=4582.756..4584.010 rows=0 loops=1)
-> Sort (cost=308411.32..308411.33 rows=1 width=279) (actual time=4546.306..4547.559 rows=0 loops=1)
Sort Key: release_date DESC, popularity DESC, sys_id
Sort Method: quicksort Memory: 25kB
-> Gather (cost=1000.00..308411.31 rows=1 width=279) (actual time=4546.121..4547.374 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tracks (cost=0.00..307411.21 rows=1 width=279) (actual time=4473.923..4473.926 rows=0 loops=3)
Filter: ((genres_string ~~ '%rum%'::text) AND (tempo >= '80'::smallint) AND (tempo <= '210'::smallint) AND (popularity >= '0'::smallint) AND (popularity <= '100'::smallint) AND (main_artist_popularity >= '1'::smallint) AND (main_artist_popularity <= '100'::smallint) AND (main_artist_followers >= 1) AND (main_artist_followers <= 50000000) AND (danceability >= '0'::smallint) AND (danceability <= '1000'::smallint) AND (energy >= '0'::smallint) AND (energy <= '1000'::smallint) AND (speechiness >= '0'::smallint) AND (speechiness <= '1000'::smallint) AND (acousticness >= '0'::smallint) AND (acousticness <= '1000'::smallint) AND (instrumentalness >= '0'::smallint) AND (instrumentalness <= '1000'::smallint) AND (liveness >= '0'::smallint) AND (liveness <= '1000'::smallint) AND (valence >= '0'::smallint) AND (valence <= '1000'::smallint) AND (release_date >= '2020-01-01'::date) AND (key = '10'::smallint) AND ((name ~~ '%oultec%'::text) OR (all_artists_string ~~ '%oultec%'::text)))
Rows Removed by Filter: 1034268
Planning Time: 5.145 ms
JIT:
Functions: 14
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 18.941 ms, Inlining 0.000 ms, Optimization 8.644 ms, Emission 110.864 ms, Total 138.449 ms
Execution Time: 4667.446 ms
PostgreSQL running from "official" image: postgres:14.1-alpine
:
Table structure:
Website that runs the queries (through an API/backend, more fields with min/max integers, but not shown here):
Upvotes: 2
Views: 91
Reputation: 108651
Your query does LIKE '%something%'
fulltext searches and range scans on dates and numbers. But BTREE indexes, the default, can handle just one range scan. And, they can't handle LIKE '%something%'
at all. So, you are getting a full table scan for every query. 4.8s for three megarows isn't too bad considering.
For your column LIKE '%something%'
searches you could try using trigram indexes, a feature of postgreSQL. This code will create trigram indexes on name
. That might narrow down the choices and make it so you have to scan less data.
CREATE EXTENSION pg_trgm; -- you may or may not need this statement.
CREATE INDEX CONCURRENTLY tracks_name
ON tracks
USING GIN (name gin_trgm_ops);
CREATE INDEX CONCURRENTLY tracks_all_artists_string
ON tracks
USING GIN (all_artists_string gin_trgm_ops);
CREATE INDEX CONCURRENTLY tracks_genres_string
ON tracks
USING GIN (genres_string gin_trgm_ops);
But you'll still have to scan through all the matching tracks.
If you create these indexes and then refactor the first bits of your WHERE clause to use set computations like so, you may (or may not) get even better performance.
WHERE sys_id IN (
(SELECT sys_id FROM tracks WHERE name LIKE '%oultec%'
UNION
SELECT sys_id FROM tracks WHERE all_artists_string LIKE '%oultec%'
)
INTERSECT
SELECT sys_id FROM tracks WHERE genres_string LIKE '%oultec%'
)
AND tempo >= '80' ...
But the fact remains that SQL isn't great for all these range scans.
Upvotes: 5