Reputation: 19
The code should select the next tag with the least amount of unrated images.
The flowing columns has indexes:
The code should go through every row and check how many duplicate tag_name-column there are. Afterwards it should return the tag with the least number of duplicate rows. The code should ignore any rows where pick = TRUE
or reject = TRUE
Meaning the code should only include files if pick = FALSE
and reject = FALSE
At the moment my code is the folowing.
SELECT image_tags.tag_name, COUNT(*) as number_of_rows FROM image_tags JOIN images ON image_tags.filename = images.filename WHERE images.pick = FALSE AND images.reject = FALSE GROUP BY image_tags.tag_name ORDER BY number_of_rows
Because of the size of the database the query takes a lot of time (sometimes as much as 30 seconds). Is there a way to make the query run faster, or is there another query that would have less execution time?
Upvotes: 0
Views: 1715
Reputation: 1270583
This is your query:
SELECT it.tag_name, COUNT(*) as number_of_rows
FROM image_tags it JOIN
images i
ON it.filename = i.filename
WHERE i.pick = FALSE AND i.reject = FALSE
GROUP BY it.tag_name
ORDER BY number_of_rows;
The first thing to try is an index on images(pick, reject, filename)
. This probably won't help, because boolean columns don't usually whittle down the data enough. But it is worth a try.
The second thing to try is this query:
SELECT it.tag_name,
(SELECT COUNT(*)
FROM images i
WHERE it.filename = i.filename AND
i.pick = FALSE AND
i.reject = FALSE
) as number_of_rows
FROM image_tags it
GROUP BY it.tag_name
ORDER BY number_of_rows;
This moves the aggregation to a subquery, which sometimes performs better in MySQL. For this, you want an index on images(filename, pick, reject)
. Also note that this returns all tags, not just those that have a matching image. You can filter using HAVING number_of_rows > 0
if you want the same results as your version.
EDIT:
Oh, I see. You can try this version:
SELECT t.tag_name,
(SELECT COUNT(*)
FROM image_tags it JOIN
images i
ON it.filename = i.filename
WHERE it.tag_name = t.tag_name AND
i.pick = FALSE AND
i.reject = FALSE
) as number_of_rows
FROM (SELECT DISTINCT tag_name FROM image_tags) t
ORDER BY number_of_rows;
For this, you want indexes on image_tags(tag_name, filename)
and images(filename, pick, reject)
. However, I'm not sure this will have much improved performance.
Upvotes: 1
Reputation: 11209
Since the images table has a int primary key, it would make more sense to have image_id as a foreign key in image_tags table, along with an index on that column. You will gain in speed and storage space.
Upvotes: 0
Reputation: 77896
Make sure you have proper index created for the columns used in JOIN ON
condition; WHERE
condition ; ORDER BY
. In your case you should have index created on
image_tags.filename
and images.filename
images.pick
; images.reject
Also consider running EXPLAIN
for the query and see what query plan you are getting which will give a better idea.
EXPLAIN SELECT image_tags.tag_name, COUNT(*) as number_of_rows FROM image_tags ....
Upvotes: 0