Reputation: 12312
Given three tables profile
, topic
, message
, I want to know for all USER
profiles whether the last message to a topic was deleted.
In case the last message was not deleted I want to get 0.5
and 0
otherwise (ie. last message was deleted or profile did never message the topic).
My query has the correct result, but it takes ~25 seconds while returning ~15.000 result rows.
How can I make it more efficient? Ideally it is < 1 second.
SELECT
p.id AS profile,
topic.id AS topic,
CASE WHEN m IS NULL THEN 0 ELSE 0.5 END AS value
FROM
profile p
CROSS JOIN topic
-- latest non deleted message per topic
LEFT JOIN message m ON (
m.profile_id = p.id
AND m.topic_id = topic.id
AND m.deleted = FALSE
AND NOT EXISTS (
SELECT m2 FROM message m2
WHERE m2.profile_id = p.id AND m.topic_id = m2.topic_id AND m.timestamp < m2.timestamp
)
)
WHERE
p.type = 'USER'
;
The result of EXPLAIN
Hash Left Join (cost=395.85..1187910.62 rows=15204 width=48)
Hash Cond: ((p.id = m.profile_id) AND (topic.id = m.topic_id))
Join Filter: (NOT (SubPlan 1))
-> Nested Loop (cost=0.00..213.67 rows=15204 width=24)
-> Seq Scan on profile p (cost=0.00..22.36 rows=724 width=8)
Filter: ((type)::text = 'USER'::text)
-> Materialize (cost=0.00..1.31 rows=21 width=16)
-> Seq Scan on topic (cost=0.00..1.21 rows=21 width=16)
-> Hash (cost=223.15..223.15 rows=11513 width=89)
-> Seq Scan on message m (cost=0.00..223.15 rows=11513 width=89)
Filter: (NOT deleted)
SubPlan 1
-> Seq Scan on message m2 (cost=0.00..309.51 rows=1 width=0)
Filter: ((m."timestamp" < "timestamp") AND (profile_id = p.id) AND (m.topic_id = topic_id))
Side note: We need to execute the query quite often and the result will be inserted into another table (INSERT INTO ... SELECT (s. above)
) for further processing.
SOLUTION
See answers!
After adding the indexes I executed all three versions intermixed 10 times. I am comparing on my local machine while other stuff is running, so it is not very scientific - but still results seem significant:
// results in ms
user | min | max | avg | portion of profiles that has type='USER'
Stuck | 171 | 216 | ~180 | ~96%
Gordon Linoff | 148 | 172 | ~160 | ~96%
sticky bit | 113 | 126 | ~120 | ~96% <-- winner
Gordon Linoff | 73 | 114 | ~90 | ~4% <-- winner when p.type='USER' is very selectiv
THANKS :)
Upvotes: 1
Views: 78
Reputation: 1271111
In case the last message was not deleted I want to get 0.5 and 0 otherwise (ie. last message was deleted or profile did never message the topic).
I'm thinking something similar to stickybit, but phrased a bit differently:
select p.id as profile, t.id as topic,
(case when not (select m.deleted
from messages m
where m.profile_id = p.id and
m.topic_id = t.id
order by m.timestamp desc
limit 1
)
then 0.5
else 0
end) as value
from profile p cross join
topic t
where p.type = 'user';
The same indexes are called for:
messages(profile_id, topic_id, timestamp desc, deleted)
profile(type, id)
Why phrase it like this? distinct on
is fast with an index. However, I suspect a simple index lookup is even faster.
Second, you don't specify how selective type = 'user'
is. This version doesn't deal with messages on other profiles, only the profiles you care about.
Upvotes: 2
Reputation: 37497
Hmm, maybe try to rewrite it so that the left join uses a subquery only containing the deleted state of the last message per topic and profile using DISTINCT ON
.
SELECT p.id profile,
t.id topic,
CASE
WHEN coalesce(x.deleted,
true) THEN
0
ELSE
0.5
END value
FROM profile p
CROSS JOIN topic t
LEFT JOIN (SELECT DISTINCT ON (m.profile_id,
m.topic_id)
m.profile_id,
m.topic_id,
m.deleted
FROM message m
ORDER BY m.profile_id ASC,
m.topic_id ASC,
m.timestamp DESC) x
ON x.profile_id = p.id
AND x.topic_id = t.id
WHERE p.type = 'USER';
For that the following indexes should be promising.
CREATE INDEX message_pid_tid_ts_d
ON message (profile_id ASC,
topic_id ASC,
timestamp DESC,
deleted ASC);
CREATE INDEX profile_t_id
ON profile (type ASC,
id ASC);
Upvotes: 1