Dan Gravell
Dan Gravell

Reputation: 8260

Casting in join filter - does it preclude an index scan?

So I'm joining over a good few tables, and the results are horrendous.

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                                                                     |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Nested Loop Left Join  (cost=15291.63..367830181335340285952.00 rows=8062002970089247211520 width=676)                                                                                         |
|   Join Filter: CASE WHEN (x415."SOURCE_URI" IS NOT NULL) THEN ((x415."SOURCE_URI")::text = (x47."SOURCE_URI")::text) ELSE NULL::boolean END                                                    |
|   ->  Nested Loop Left Join  (cost=15291.63..53031043002887.09 rows=1767529209075750 width=900)                                                                                                |
|         Join Filter: CASE WHEN (CASE WHEN (x414."CAT" IS NULL) THEN NULL::integer ELSE 1 END IS NOT NULL) THEN ((x414."CAT")::text = (x415."CAT")::text) ELSE NULL::boolean END                |
|         ->  Nested Loop Left Join  (cost=15291.63..5166730086.91 rows=172147962900 width=846)                                                                                                  |
|               Join Filter: CASE WHEN (CASE WHEN (x297."ID" IS NULL) THEN NULL::integer ELSE 1 END IS NOT NULL) THEN (x297."ID" = x258."TRACK") ELSE NULL::boolean END                          |
|               ->  Nested Loop Left Join  (cost=15291.63..2291101.66 rows=71430690 width=816)                                                                                                   |
|                     Join Filter: CASE WHEN (CASE WHEN (x297."ID" IS NULL) THEN NULL::integer ELSE 1 END IS NOT NULL) THEN (x297."ID" = x213."TRACK") ELSE NULL::boolean END                    |
|                     ->  Nested Loop Left Join  (cost=15291.63..148094.28 rows=33270 width=786)                                                                                                 |
|                           Join Filter: CASE WHEN (CASE WHEN (x330."ID" IS NULL) THEN NULL::integer ELSE 1 END IS NOT NULL) THEN (x330."ID" = x297."MEDIUM") ELSE NULL::boolean END             |
|                           ->  Nested Loop Left Join  (cost=15291.63..146700.24 rows=4 width=724)                                                                                               |
|                                 ->  Nested Loop Left Join  (cost=15291.22..146681.05 rows=4 width=654)                                                                                         |
|                                       ->  Nested Loop Left Join  (cost=15290.80..146667.32 rows=4 width=542)                                                                                   |
|                                             ->  Nested Loop Left Join  (cost=15290.25..146643.82 rows=4 width=496)                                                                             |
|                                                   ->  Nested Loop Left Join  (cost=15289.69..146621.36 rows=4 width=448)                                                                       |
|                                                         ->  Hash Right Join  (cost=15289.27..146602.54 rows=4 width=438)                                                                       |
|                                                               Hash Cond: ((x355."SOURCE_URI")::text = (x410."SOURCE_URI")::text)                                                               |
|                                                               ->  Seq Scan on "RELEASE_IMAGE" x355  (cost=0.00..111558.45 rows=5267945 width=120)                                              |
|                                                               ->  Hash  (cost=15289.22..15289.22 rows=4 width=318)                                                                             |
|                                                                     ->  Hash Right Join  (cost=13996.67..15289.22 rows=4 width=318)                                                            |
|                                                                           Hash Cond: ((x376."SOURCE_URI")::text = (x410."SOURCE_URI")::text)                                                   |
|                                                                           ->  Seq Scan on "RELEASE_VOTED_TAG" x376  (cost=0.00..1118.94 rows=46294 width=82)                              
|
|                                                                           ->  Hash  (cost=13996.62..13996.62 rows=4 width=236)                                                                 |
|                                                                                 ->  Hash Right Join  (cost=13934.61..13996.62 rows=4 width=236)                                                |
|                                                                                       Hash Cond: ((x330."SOURCE_URI")::text = (x410."SOURCE_URI")::text)                                       |
|                                                                                       ->  Seq Scan on "MEDIUM" x330  (cost=0.00..53.00 rows=2400 width=83)                                     |
|                                                                                       ->  Hash  (cost=13934.56..13934.56 rows=4 width=153)                                                     |
|                                                                                             ->  Nested Loop  (cost=0.56..13934.56 rows=4 width=153)                                            |
|                                                                                                   ->  Seq Scan on "RELEASE_BARCODE"  (cost=0.00..13900.21 rows=4 width=40)                     |
|                                                                                                         Filter: (("BARCODE")::text = ANY ('{75992731324,075992731324,0075992731324}'::text[])) |
|                                                                                                   ->  Index Scan using "RELEASE_pkey" on "RELEASE" x410  (cost=0.56..8.58 rows=1 width=153)    |
|                                                                                                         Index Cond: (("SOURCE_URI")::text = ("RELEASE_BARCODE"."SOURCE_URI")::text)            |
|                                                         ->  Index Only Scan using "RELEASE_CAT_PK" on "RELEASE_CAT_NO" x414  (cost=0.41..4.70 rows=1 width=74)                                 |
|                                                               Index Cond: ("SOURCE_URI" = (x410."SOURCE_URI")::text)                                                                           |
|                                                   ->  Index Only Scan using "RELEASE_GENRE_PK" on "RELEASE_GENRE" x409  (cost=0.56..5.61 rows=1 width=48)                                      |
|                                                         Index Cond: ("SOURCE_URI" = (x410."SOURCE_URI")::text)                                                                                 |
|                                             ->  Index Only Scan using "RELEASE_TYPE_PK" on "RELEASE_TYPE" x394  (cost=0.56..5.83 rows=4 width=46)                                              |
|                                                   Index Cond: ("SOURCE_URI" = (x410."SOURCE_URI")::text)                                                                                       |
|                                       ->  Index Only Scan using "RELEASE_URL_PK" on "RELEASE_URL" x165  (cost=0.41..3.41 rows=2 width=112)                                                     |
|                                             Index Cond: ("SOURCE_URI" = (x410."SOURCE_URI")::text)                                                                                             |
|                                 ->  Index Scan using release_label_source_uri on "RELEASE_LABEL" x111  (cost=0.41..4.79 rows=1 width=134)                                                      |
|                                       Index Cond: ((x410."SOURCE_URI")::text = ("SOURCE_URI")::text)                                                                                           |
|                           ->  Materialize  (cost=0.00..437.53 rows=16635 width=62)                                                                                                             |
|                                 ->  Seq Scan on "TRACK" x297  (cost=0.00..354.35 rows=16635 width=62)                                                                                          |
|                     ->  Materialize  (cost=0.00..97.41 rows=4294 width=30)                                                                                                                     |
|                           ->  Seq Scan on "TRACK_COMPOSER" x213  (cost=0.00..75.94 rows=4294 width=30)                                                                                         |
|               ->  Materialize  (cost=0.00..110.30 rows=4820 width=30)                       
|                     ->  Seq Scan on "TRACK_ARTIST" x258  (cost=0.00..86.20 rows=4820 width=30)                                                                                                 |
|         ->  Materialize  (cost=0.00..579.02 rows=20535 width=74)                                                                                                                               |
|               ->  Seq Scan on "RELEASE_CAT_NO" x415  (cost=0.00..476.35 rows=20535 width=74)                                                                                                   |
|   ->  Materialize  (cost=0.00..366235.13 rows=9122342 width=40)                                                                                                                                |
|         ->  Seq Scan on "RELEASE" x47  (cost=0.00..249354.42 rows=9122342 width=40)                                                                                                            |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

https://explain.depesz.com/s/3DSD

My first reaction was to add some indexes. So I added the following:

CREATE INDEX RELEASE_CAT_CAT_NO on "RELEASE_CAT_NO" ("CAT");
CREATE INDEX "track_medium" on "TRACK" ("MEDIUM");
CREATE INDEX "track_composer_track" on "TRACK_COMPOSER" ("TRACK");
CREATE INDEX "track_artist_track" on "TRACK_ARTIST" ("TRACK");

But this makes no difference. When I perform simpler queries I can see the indexes being used, but still not for this query.

That said, adding indexes did help for:

CREATE INDEX "release_label_source_uri" on "RELEASE_LABEL" ("SOURCE_URI");

I'm wondering whether the join filters, which potentially cast values to different types, are responsible:

|         Join Filter: CASE WHEN (CASE WHEN (x414."CAT" IS NULL) THEN NULL::integer ELSE 1 END IS NOT NULL) THEN ((x414."CAT")::text = (x415."CAT")::text) ELSE NULL::boolean END                |

CAT is a varchar and I created an index for it as above. The code above code is taken when a subquery performs a select for, returning 1 or 0 depending on whether CAT is null.

I assume this only occurs on results, and doesn't impact the type of scan? But the reason I am wondering is because it's appearing in the "join filter" output.

This is a query generated by the Slick framework btw. PostgreSQL 9.6.3.

Upvotes: 0

Views: 579

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 247495

Some ideas:

  • You have exclusively outer joins. That limits the possible execution paths considerably.
    Check if you really need outer joins everywhere or if you could use inner joins in some places.

  • Many of your join conditions are really complicated and allow only nested loop joins, which will hurt performance a lot if many rows are involved.
    Try to simplify them!

    For example, consider this:

    ... LEFT JOIN ...
    ON CASE
          WHEN (x415."SOURCE_URI" IS NOT NULL)
          THEN ((x415."SOURCE_URI")::text = (x47."SOURCE_URI")::text)
          ELSE NULL::boolean
       END
    

    This brain damaged piece of SQL could be written as

    ... LEFT JOIN ...
    ON x415."SOURCE_URI" = x47."SOURCE_URI"
    

    Then PostgreSQL could use a hash join, and if you have a lot of rows, that will speed up your query considerably.

  • One more index that could help with your execution plan, depending on how big the table is:

    CREATE INDEX ON "RELEASE_BARCODE"("BARCODE");
    

Upvotes: 1

Related Questions