Mathias Nervik
Mathias Nervik

Reputation: 19

Speed up a slow SQL query on a large database

This is a picture-type webpage

The code should select the next tag with the least amount of unrated images.

Table info

table_name = images

table_name = image_tags

Indexes

The flowing columns has indexes:

Functionality

The code should go through every row and check how many duplicate tag_name-column there are. Afterwards it should return the tag with the least number of duplicate rows. The code should ignore any rows where pick = TRUE or reject = TRUE

Meaning the code should only include files if pick = FALSE and reject = FALSE

Code

At the moment my code is the folowing.

SELECT image_tags.tag_name, COUNT(*) as number_of_rows FROM image_tags JOIN images ON image_tags.filename = images.filename WHERE images.pick = FALSE AND images.reject = FALSE GROUP BY image_tags.tag_name ORDER BY number_of_rows

Question

Because of the size of the database the query takes a lot of time (sometimes as much as 30 seconds). Is there a way to make the query run faster, or is there another query that would have less execution time?

Size

Upvotes: 0

Views: 1715

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270583

This is your query:

SELECT it.tag_name, COUNT(*) as number_of_rows
FROM image_tags it JOIN
     images i
     ON it.filename = i.filename
WHERE i.pick = FALSE AND i.reject = FALSE
GROUP BY it.tag_name
ORDER BY number_of_rows;

The first thing to try is an index on images(pick, reject, filename). This probably won't help, because boolean columns don't usually whittle down the data enough. But it is worth a try.

The second thing to try is this query:

SELECT it.tag_name,
       (SELECT COUNT(*) 
        FROM images i
        WHERE it.filename = i.filename AND
              i.pick = FALSE AND
              i.reject = FALSE
       ) as number_of_rows
FROM image_tags it 
GROUP BY it.tag_name
ORDER BY number_of_rows;

This moves the aggregation to a subquery, which sometimes performs better in MySQL. For this, you want an index on images(filename, pick, reject). Also note that this returns all tags, not just those that have a matching image. You can filter using HAVING number_of_rows > 0 if you want the same results as your version.

EDIT:

Oh, I see. You can try this version:

SELECT t.tag_name,
       (SELECT COUNT(*) 
        FROM image_tags it JOIN
             images i
             ON it.filename = i.filename
        WHERE it.tag_name = t.tag_name AND
              i.pick = FALSE AND
              i.reject = FALSE
       ) as number_of_rows
FROM (SELECT DISTINCT tag_name FROM image_tags) t 
ORDER BY number_of_rows;

For this, you want indexes on image_tags(tag_name, filename) and images(filename, pick, reject). However, I'm not sure this will have much improved performance.

Upvotes: 1

Tarik
Tarik

Reputation: 11209

Since the images table has a int primary key, it would make more sense to have image_id as a foreign key in image_tags table, along with an index on that column. You will gain in speed and storage space.

Upvotes: 0

Rahul
Rahul

Reputation: 77896

Make sure you have proper index created for the columns used in JOIN ON condition; WHERE condition ; ORDER BY. In your case you should have index created on

image_tags.filename and images.filename images.pick; images.reject

Also consider running EXPLAIN for the query and see what query plan you are getting which will give a better idea.

EXPLAIN SELECT image_tags.tag_name, COUNT(*) as number_of_rows FROM image_tags ....

Upvotes: 0

Related Questions