Reputation: 9873
First thing's first: what I am doing works perfectly fine. I'm just seeing if there is any room for improvements, and if how I'm doing things is standard and/or using good practices.
These are the tables in question:
item
topic
item_topic
item_like_audit
.This is my use case:
topic
's that can contain many item
's. item
can have N amount of likes on them. item_like_audit
table, such that is can be queried at a later time for ranking purposes.This is what the query is trying to achieve:
Can the following query or underlying schema be improved in any way (for performance or memory gains)?
Query:
SELECT DISTINCT item.* FROM item
/* Match items under this specific topic */
JOIN topic
ON topic.slug = ?
AND topic.deleted_at IS NULL
JOIN item_topic
ON item_topic.item_id = item.id
AND item_topic.topic_id = topic.id
AND item_topic.deleted_at IS NULL
/* Match items that have had "like" activity in the past 7 days */
JOIN item_like_audit
ON item_like_audit.item_id = item.id
AND item_like_audit.created_at <= (CURRENT_DATE + INTERVAL 7 DAY)
WHERE item.deleted_at IS NULL
/* Order by highest like count to lowest */
ORDER BY item.like_count DESC
/* Pagination */
LIMIT ? OFFSET ?
Schema:
CREATE TABLE item (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) NOT NULL UNIQUE,
tagline VARCHAR(255) NOT NULL,
description VARCHAR(1000) NOT NULL,
price FLOAT NOT NULL,
like_count INT(10) NOT NULL DEFAULT 0,
images VARCHAR(1000) NOT NULL,
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY (id)
);
CREATE TABLE item_like_audit (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
item_id INT(10) UNSIGNED NOT NULL,
user_id INT(10) UNSIGNED NOT NULL,
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id),
KEY `item_like_audit_created_at_index` (`created_at`)
);
CREATE TABLE topic (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) NOT NULL UNIQUE,
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY (id)
);
CREATE TABLE item_topic (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
item_id INT(10) NOT NULL,
topic_id INT(10) NOT NULL,
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY (id)
);
Upvotes: 0
Views: 44
Reputation: 108400
Assuming item_topic(item_id,topic_id)
is unique, we could do away with the "Using filesort" operation by getting rid of the DISTINCT
keyword, and rewriting the check of item_like_audit
as an EXISTS correlated subquery instead of a JOIN operation.
We'd have a guarantee of the uniqueness if we had
CREATE UNIQUE INDEX item_topic_UX1 ON item_topic (topic_id, item_id);
We already have guarantees of uniqueness for topic(slug)
, topic(id)
, item(id)
, ...
SELECT item.*
FROM item
/* Match items under this specific topic */
JOIN item_topic
ON item_topic.item_id = item.id
AND item_topic.deleted_at IS NULL
JOIN topic
ON topic.id = item_topic.topic_id
AND topic.slug = ?
AND topic.deleted_at IS NULL
WHERE item.deleted_at IS NULL
/* Match items that have had "like" activity in the past 7 days */
AND EXISTS ( SELECT 1
FROM item_like_audit
WHERE item_like_audit.item_id = item.id
AND item_like_audit.created_at >= DATE(NOW()) + INTERVAL -7 DAY
)
/* Order by highest like count to lowest */
ORDER BY item.like_count DESC
For improved performance of the correlated subquery, we could create a covering index
CREATE INDEX item_like_audit_IX1 ON item_like_audit (item_id, created_at)
We expect the unique index we created earlier will be used for the join operation, so that should also improve performance. We could get a covering index if we included deleted_at
column
CREATE INDEX item_topic_IX2 ON item_topic (topic_id, item_id, deleted_at)
That is redundant with the unique index we created earlier, if we still want to guarantee uniqueness, flip the order of the columns around...
DROP INDEX item_topic_UX1 ON item_topic ;
CREATE UNIQUE INDEX item_topic_UX1 ON item_topic (item_id,topic_id);
If we don't have guaranteed uniqueness, then I would favor adding a GROUP BY item.id
clause over a DISTINCT
keyword.
Use EXPLAIN
to see the execution plan, and verify that appropriate indexes are being used.
If we can't guarantee uniqueness of (item_id,topic_id)
from item_topic
, and the overhead of the "Using filesort" operation for the GROUP BY
operation is still too high,
We could try checking the "matching topic" condition using an EXISTS. (But I don't hold out much hope that this will be any faster.)
SELECT item.*
FROM item
WHERE item.deleted_at IS NULL
AND EXISTS ( SELECT 1
FROM topic
JOIN item_topic
ON item_topic.item_id = item.id
AND item_topic.topic_id = topic.id
AND item_topic.deleted_at IS NULL
JOIN item_like_audit
ON item_like_audit = item.id
AND item_like_audit.created_at >= DATE(NOW()) + INTERVAL -7 DAY
WHERE topic.slug = ?
AND topic.deleted_at IS NULL
)
ORDER BY item.like_count DESC
We are going to need to have suitable indexes available for performance of the correlated subquery.
Upvotes: 1
Reputation: 6449
Since you are only returning Item records, you could try this for possible improved performance:
select Item.*
from Item
where Item.deleted_at is null
and exists (select 1 from item_topic
where item_topic.item_id = item.id
and itme_topic.deleted_at is null
and exists (select 1 from topic
where topic.id = item_topic.item_id
and topic.deleted_at is null
and topic.slug = ?))
and exists (select 1 from item_like_audit
where item_like_audit.item_id = item.id
and item_liek_audit.created_at >= (current_date - interval 7 day))
order by Item.like_count desc
This can potentially improve performance since:
DISTINCT
operatorUpvotes: 1