Garrett
Garrett

Reputation: 11725

Postgres select on large (15m rows) table extremely slow, even with index

I'm trying to run EXPLAIN ANALYZE but it simply won't finish because it's so slow. If it does, I'll post the results, but for now, here is the EXPLAIN.

Query:

EXPLAIN SELECT
    *
FROM
    "Posts" AS "Post"
WHERE
    (
        "Post"."featurePostOnDate" > '2020-06-25 19:28:07.816 +00:00'
        OR (
            "Post"."featurePostOnDate" IS NULL
            AND "Post"."userId" IN (6863684)
        )
    )
AND "Post"."private" IS NULL
ORDER BY
    "Post"."featurePostOnDate" DESC NULLS LAST,
    "Post"."createdAt" DESC NULLS LAST
LIMIT 10;

Result:

Limit  (cost=0.56..110.92 rows=10 width=1136)
  ->  Index Scan using posts_updated_following_feed_idx on "Posts" "Post"  (cost=0.56..284949.60 rows=25819 width=1136)
        Filter: (("featurePostOnDate" > '2020-06-25 19:28:07.816+00'::timestamp with time zone) OR (("featurePostOnDate" IS NULL) AND ("userId" = 6863684)))

Index:

CREATE INDEX  "posts_updated_following_feed_idx" ON "public"."Posts" USING btree (
    "featurePostOnDate" DESC NULLS LAST,
    "createdAt" DESC NULLS LAST
)
WHERE
    private IS NULL;

Upvotes: 1

Views: 232

Answers (2)

jjanes
jjanes

Reputation: 44157

You would need to write it as two separate queries, one for each branch of the OR. Apply the limit to each query, then combine them and apply the limit again jointly. But if the first branch finds ten rows, the second one doesn't need to run at all as all non-NULL dates already come first.

Upvotes: 1

namar sood
namar sood

Reputation: 1590

So, as you are having 15m rows, and you have used ANALYZE. Using ANALYZE actually runs the query, you can refer it from here https://www.postgresql.org/docs/9.1/sql-explain.html.

And in WHERE clause you have used the fields which are not indexed

WHERE
    (
        "Post"."featurePostOnDate" > '2020-06-25 19:28:07.816 +00:00'
        OR (
            "Post"."featurePostOnDate" IS NULL
            AND "Post"."userId" IN (6863684)
        )
    )
AND "Post"."private" IS NULL

So it is actually doing a sequential scan to filter out the rows

Filter: (("featurePostOnDate" > '2020-06-25 19:28:07.816+00'::timestamp with time zone) OR (("featurePostOnDate" IS NULL) AND ("userId" = 6863684)))

That might be the reason your query is slow.

You might need compound indexes on (featurePostOnDate, userId, private) and (featurePostOnDate, private).

I hope this helps.

Upvotes: 1

Related Questions