Reputation: 6260
The Schema
The Problem
My Page 1 Query
Find posts with likes > 0 having the word scam in their title or summary tagged for
SELECT
fi.feed_item_id,
pubdate,
link,
title,
summary,
author,
feed_id,
likes,
dislikes,
love,
anger,
tags
FROM
feed_items fi
LEFT JOIN
feed_item_tags t
ON fi.feed_item_id = t.feed_item_id
LEFT JOIN
feed_item_love_anger_aggregate bba
ON fi.feed_item_id = bba.feed_item_id
LEFT JOIN
feed_item_likes_dislikes_aggregate lda
ON fi.feed_item_id = lda.feed_item_id
WHERE
(
title_vector @@ to_tsquery('scam')
OR summary_vector @@ to_tsquery('scam')
)
AND 'for' = ANY(tags)
AND likes > 0
ORDER BY
pubdate DESC,
feed_item_id DESC LIMIT 3;
EXPLAIN ANALYZE Page 1
Limit (cost=2.83..16.88 rows=3 width=233) (actual time=0.075..0.158 rows=3 loops=1)
-> Nested Loop Left Join (cost=2.83..124.53 rows=26 width=233) (actual time=0.074..0.157 rows=3 loops=1)
-> Nested Loop (cost=2.69..116.00 rows=26 width=217) (actual time=0.067..0.146 rows=3 loops=1)
Join Filter: (t.feed_item_id = fi.feed_item_id)
Rows Removed by Join Filter: 73
-> Index Scan using idx_feed_items_pubdate_feed_item_id_desc on feed_items fi (cost=0.14..68.77 rows=76 width=62) (actual time=0.016..0.023 rows=3 loops=1)
Filter: ((title_vector @@ to_tsquery('scam'::text)) OR (summary_vector @@ to_tsquery('scam'::text)))
Rows Removed by Filter: 1
-> Materialize (cost=2.55..8.56 rows=34 width=187) (actual time=0.016..0.037 rows=25 loops=3)
-> Hash Join (cost=2.55..8.39 rows=34 width=187) (actual time=0.044..0.091 rows=36 loops=1)
Hash Cond: (t.feed_item_id = lda.feed_item_id)
-> Seq Scan on feed_item_tags t (cost=0.00..5.25 rows=67 width=155) (actual time=0.009..0.043 rows=67 loops=1)
Filter: ('for'::text = ANY ((tags)::text[]))
Rows Removed by Filter: 33
-> Hash (cost=1.93..1.93 rows=50 width=32) (actual time=0.029..0.029 rows=50 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on feed_item_likes_dislikes_aggregate lda (cost=0.00..1.93 rows=50 width=32) (actual time=0.004..0.013 rows=50 loops=1)
Filter: (likes > 0)
Rows Removed by Filter: 24
-> Index Scan using idx_feed_item_love_anger_aggregate on feed_item_love_anger_aggregate bba (cost=0.14..0.32 rows=1 width=32) (actual time=0.002..0.003 rows=0 loops=3)
Index Cond: (feed_item_id = fi.feed_item_id)
Planning Time: 0.601 ms
Execution Time: 0.195 ms
(23 rows)
It is doing a sequential scan 2 times despite having the appropriate indices on all tables
My Page N Query
Take the published date and feed_item_id of the 3rd result from the above query and load the next 3 results
SELECT
fi.feed_item_id,
pubdate,
link,
title,
summary,
author,
feed_id,
likes,
dislikes,
love,
anger,
tags
FROM
feed_items fi
LEFT JOIN
feed_item_tags t
ON fi.feed_item_id = t.feed_item_id
LEFT JOIN
feed_item_love_anger_aggregate bba
ON fi.feed_item_id = bba.feed_item_id
LEFT JOIN
feed_item_likes_dislikes_aggregate lda
ON fi.feed_item_id = lda.feed_item_id
WHERE
(
pubdate,
fi.feed_item_id
)
< ('2020-06-19 19:50:00+05:30', 'bc5c8dfe-13a9-d97a-a328-0e5b8990c500')
AND
(
title_vector @@ to_tsquery('scam')
OR summary_vector @@ to_tsquery('scam')
)
AND 'for' = ANY(tags)
AND likes > 0
ORDER BY
pubdate DESC,
feed_item_id DESC LIMIT 3;
Explain Page N Query Despite filtering it is doing 2 sequential scans
Limit (cost=2.83..17.13 rows=3 width=233) (actual time=0.082..0.199 rows=3 loops=1)
-> Nested Loop Left Join (cost=2.83..121.97 rows=25 width=233) (actual time=0.081..0.198 rows=3 loops=1)
-> Nested Loop (cost=2.69..113.67 rows=25 width=217) (actual time=0.073..0.185 rows=3 loops=1)
Join Filter: (t.feed_item_id = fi.feed_item_id)
Rows Removed by Join Filter: 183
-> Index Scan using idx_feed_items_pubdate_feed_item_id_desc on feed_items fi (cost=0.14..67.45 rows=74 width=62) (actual time=0.014..0.034 rows=6 loops=1)
Index Cond: (ROW(pubdate, feed_item_id) < ROW('2020-06-19 19:50:00+05:30'::timestamp with time zone, 'bc5c8dfe-13a9-d97a-a328-0e5b8990c500'::uuid))
Filter: ((title_vector @@ to_tsquery('scam'::text)) OR (summary_vector @@ to_tsquery('scam'::text)))
Rows Removed by Filter: 2
-> Materialize (cost=2.55..8.56 rows=34 width=187) (actual time=0.009..0.022 rows=31 loops=6)
-> Hash Join (cost=2.55..8.39 rows=34 width=187) (actual time=0.050..0.098 rows=36 loops=1)
Hash Cond: (t.feed_item_id = lda.feed_item_id)
-> Seq Scan on feed_item_tags t (cost=0.00..5.25 rows=67 width=155) (actual time=0.009..0.044 rows=67 loops=1)
Filter: ('for'::text = ANY ((tags)::text[]))
Rows Removed by Filter: 33
-> Hash (cost=1.93..1.93 rows=50 width=32) (actual time=0.028..0.029 rows=50 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on feed_item_likes_dislikes_aggregate lda (cost=0.00..1.93 rows=50 width=32) (actual time=0.005..0.014 rows=50 loops=1)
Filter: (likes > 0)
Rows Removed by Filter: 24
-> Index Scan using idx_feed_item_love_anger_aggregate on feed_item_love_anger_aggregate bba (cost=0.14..0.32 rows=1 width=32) (actual time=0.003..0.003 rows=1 loops=3)
Index Cond: (feed_item_id = fi.feed_item_id)
Planning Time: 0.596 ms
Execution Time: 0.236 ms
(24 rows)
I have setup a fiddle with the required tables and indexes, could someone tell me how to fix the query to use an Index Scan at best or cut down the number of Sequential Scans to 1?
Upvotes: 0
Views: 499
Reputation: 44383
You currently have no other index on the tags table than the GIN index. In your fiddle, if I create index on feed_item_tags (feed_item_id)
and do an ANALYZE
, then both seq scans go away. It is probably better to do it this way then it is to reformulate so that it can use the GIN index, like my other answer, because this way makes more efficient use of the prospect of stopping early with the LIMIT.
But really, what is the point of "feed_item_tags" table? If you are going to have a child table to list tags, you would usually have one tag/parent_id combination per row. If you want an array of tags instead of a column of them, why not just stick the array directly into the parent table? There are sometimes reasons to have tables with 1:1 relationship between two tables, but not very often.
Upvotes: 1
Reputation: 44383
The construct 'for' = ANY(tags)
cannot use the GIN index. To be able to use that, you would need to reformulate it into something like '{for}' <@ tags
.
However, it will then choose not to use the index anyway, because the table is so small and the condition so nonselective. If you want to force into using the index anyway, to prove it is capable of doing so, you could first set enable_seqscan=off
.
Upvotes: 1