Stuck
Stuck

Reputation: 12312

How could we make this query more efficient?

Given three tables profile, topic, message, I want to know for all USER profiles whether the last message to a topic was deleted.

In case the last message was not deleted I want to get 0.5 and 0 otherwise (ie. last message was deleted or profile did never message the topic).

My query has the correct result, but it takes ~25 seconds while returning ~15.000 result rows.

How can I make it more efficient? Ideally it is < 1 second.

SELECT
  p.id AS profile,
  topic.id AS topic,
  CASE WHEN m IS NULL THEN 0 ELSE 0.5 END AS value
FROM
  profile p
  CROSS JOIN topic
  -- latest non deleted message per topic
  LEFT JOIN message m ON (
    m.profile_id = p.id
    AND m.topic_id = topic.id
    AND m.deleted = FALSE
    AND NOT EXISTS (
      SELECT m2 FROM message m2
      WHERE m2.profile_id = p.id AND m.topic_id = m2.topic_id AND m.timestamp < m2.timestamp
    )
  )
WHERE 
  p.type = 'USER'
;

The result of EXPLAIN

Hash Left Join  (cost=395.85..1187910.62 rows=15204 width=48)
  Hash Cond: ((p.id = m.profile_id) AND (topic.id = m.topic_id))
  Join Filter: (NOT (SubPlan 1))
  ->  Nested Loop  (cost=0.00..213.67 rows=15204 width=24)
        ->  Seq Scan on profile p  (cost=0.00..22.36 rows=724 width=8)
              Filter: ((type)::text = 'USER'::text)
        ->  Materialize  (cost=0.00..1.31 rows=21 width=16)
              ->  Seq Scan on topic  (cost=0.00..1.21 rows=21 width=16)
  ->  Hash  (cost=223.15..223.15 rows=11513 width=89)
        ->  Seq Scan on message m  (cost=0.00..223.15 rows=11513 width=89)
              Filter: (NOT deleted)
  SubPlan 1
    ->  Seq Scan on message m2  (cost=0.00..309.51 rows=1 width=0)
          Filter: ((m."timestamp" < "timestamp") AND (profile_id = p.id) AND (m.topic_id = topic_id))

Side note: We need to execute the query quite often and the result will be inserted into another table (INSERT INTO ... SELECT (s. above)) for further processing.


SOLUTION

See answers!

After adding the indexes I executed all three versions intermixed 10 times. I am comparing on my local machine while other stuff is running, so it is not very scientific - but still results seem significant:

// results in ms
user          | min | max | avg  | portion of profiles that has type='USER'
Stuck         | 171 | 216 | ~180 | ~96%
Gordon Linoff | 148 | 172 | ~160 | ~96%
sticky bit    | 113 | 126 | ~120 | ~96% <-- winner
Gordon Linoff |  73 | 114 |  ~90 |  ~4% <-- winner when p.type='USER' is very selectiv

THANKS :)

Upvotes: 1

Views: 78

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1271111

In case the last message was not deleted I want to get 0.5 and 0 otherwise (ie. last message was deleted or profile did never message the topic).

I'm thinking something similar to stickybit, but phrased a bit differently:

select p.id as profile, t.id as topic,
       (case when not (select m.deleted
                       from messages m
                       where m.profile_id = p.id and
                             m.topic_id = t.id
                       order by m.timestamp desc
                       limit 1
                      )
             then 0.5
             else 0
         end) as value
from profile p cross join
     topic t
where p.type = 'user';

The same indexes are called for:

  • messages(profile_id, topic_id, timestamp desc, deleted)
  • profile(type, id)

Why phrase it like this? distinct on is fast with an index. However, I suspect a simple index lookup is even faster.

Second, you don't specify how selective type = 'user' is. This version doesn't deal with messages on other profiles, only the profiles you care about.

Upvotes: 2

sticky bit
sticky bit

Reputation: 37497

Hmm, maybe try to rewrite it so that the left join uses a subquery only containing the deleted state of the last message per topic and profile using DISTINCT ON.

SELECT p.id profile,
       t.id topic,
       CASE
         WHEN coalesce(x.deleted,
                       true) THEN
           0
         ELSE
           0.5
       END value
       FROM profile p
            CROSS JOIN topic t
            LEFT JOIN (SELECT DISTINCT ON (m.profile_id,
                                           m.topic_id)
                              m.profile_id,
                              m.topic_id,
                              m.deleted
                              FROM message m
                              ORDER BY m.profile_id ASC,
                                       m.topic_id ASC,
                                       m.timestamp DESC) x
                      ON x.profile_id = p.id
                         AND x.topic_id = t.id
       WHERE p.type = 'USER';

For that the following indexes should be promising.

CREATE INDEX message_pid_tid_ts_d
             ON message (profile_id ASC,
                         topic_id ASC,
                         timestamp DESC,
                         deleted ASC);
CREATE INDEX profile_t_id
             ON profile (type ASC,
                         id ASC);

Upvotes: 1

Related Questions