Johann
Johann

Reputation: 23

order by makes query slow

I have two tables :

video (ID, TITLE, ..., UPLOADED_DATE)
join_video_category (ID (not used), ID_VIDEO_ ID_CATEGORY)

rows in video : 4 500 000 | rows in join_video_category : 5 800 000

1 video can have many category.

I have a query works perfectly, 20 ms max to get result :

SELECT * FROM video WHERE ID IN
(SELECT ID_VIDEO FROM join_video_category WHERE ID_CATEGORY=11)
LIMIT 1000;

This query take 1000 video, the order is not important.

BUT, when i would like to get 10 latest video from a category, my query take arround 30-40 seconds :

SELECT * FROM video WHERE ID IN
(SELECT ID_VIDEO FROM join_video_category WHERE ID_CATEGORY=11)
ORDER BY UPLOADED_DATE DESC LIMIT 10;

I have index on ID_CATEGORY, ID_VIDEO, UPLOADED_DATE, PRIMARY ON ID video and join_video_category.

I have tested it with JOIN on my query, it's the same result.

Upvotes: 0

Views: 157

Answers (3)

Rick James
Rick James

Reputation: 142278

If it is 1:Many, don't use an extra table between Video and Category. However, your row counts imply that it is Many:Many.

If it is 1:Many, simply have the category_id in the Video table, then simplify all the queries.

If it is Many:Many, then be sure to use this pattern for the junction table:

CREATE TABLE map_video_category (
    video_id ...,
    category_id ...,
    PRIMARY KEY(video_id, category_id),  -- both ids, one direction
    INDEX      (category_id, video_id)   -- both ids, the other direction
) ENGINE=InnoDB;  -- significantly better than MyISAM on INDEX handling here

The ID that you mentioned is a waste. The composite keys are optimal for all situations, and will improve performance in most situations.

Do not use IN ( SELECT ... ); the optimizer does a poor job of optimizing it. Change to a JOIN, LEFT JOIN, EXISTS, or some other construct.

Upvotes: 0

Jawad Azzam
Jawad Azzam

Reputation: 79

solution #1: replacing "in" with "exists" would improve the performance, please try the below query.

SELECT * FROM video WHERE exists
(SELECT * FROM join_video_category WHERE ID_CATEGORY=11 AND join_video_category.ID_VIDEO = video.ID)
ORDER BY UPLOADED_DATE DESC LIMIT 10;

solution #2:

1) create tem_table

CREATE TABLE TEMP_TABLE AS SELECT * FROM join_video_category WHERE ID_CATEGORY=11;

2) use the temp table in solution #1

SELECT * FROM video WHERE exists
    (SELECT * FROM temp_table WHERE temp_table.ID_VIDEO = video.ID)
    ORDER BY UPLOADED_DATE DESC LIMIT 10;

Good Luck!!

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269663

First, the comparisons are to two very different queries. The first returns a bunch of videos whenever it encounters them. The second has to read all the videos and then sort them.

Try rewriting this as a JOIN:

SELECT v.*
FROM video v JOIN
     join_video_category vc
     ON v.id = bc.id_video
WHERE vc.ID_CATEGORY = 11
ORDER BY v.UPLOADED_DATE DESC
LIMIT 10;

That may or may not help. You have a lot of data and so you might have a lot of videos for a given category. If so, a where clause that gets more recent data might really help:

SELECT v.*
FROM video v JOIN
     join_video_category vc
     ON v.id = bc.id_video
WHERE vc.ID_CATEGORY = 11 AND v.UPLOADED_DATE >= '2015-01-01'
ORDER BY v.UPLOADED_DATE DESC
LIMIT 10;

Finally, if that doesn't work, consider adding something like UPLOADED_DATE into join_video_category. Then, this query should blaze:

select vc.video_id
from join_vdeo_category vc
where vc.ID_CATEGORY = 11 
order by vc.UPLOADED_DATE desc
limit 10;

with an index on join_video_category(id_category, uploaded_date, video_id).

Upvotes: 1

Related Questions