swood
swood

Reputation: 55

How to better optimise MySQL SELECT statements with specific search criteria?

I have developed an image searching tool to index a directory of all images, store the data and then retrieve the results based on a search.

This is working but due to the masses of data (approx 2.7 million rows of metadata for 150,000 images), it is taking a long time to return results (sometime >5 mins). I believe this is due to querying the second set of data in a foreach loop. I am looking to combine my sql into a single statement to better optimise (unless there is a another way!).

So far I am returning all the image_id's whereby the metadata matches the search term, and then using the returned image_id's to get all the metadata in a metadata table for each image. Here is the basic structure of the db:

image_id       filename         thumb 
=============  ============     ================= 
1              image_XYZ.jpg    image_XYZ_thumb.jpg
2              emoticon.png     emoticon_thumb.jpg
3              runner_bean.jpg  runner_bean_thumb.jpg

meta_id  meta_key     meta_value      image_id
=======  ========     ==========      ========
1        filetype     jpg             1
2        keyword      runner          1
3        height       600             1
4        filetype     png             2
5        filesize     5198413         2
6        description  smiley face     2
7        filetype     jpg             3
8        filesize     12485           3
9        description  runnerbean      3
10       keyword      runner          3
11       keyword      vegetable       3

1.

So far my code starts by getting all the images where the search term "runner" matches (So images 1 and 3)

SELECT image_id
FROM metadata
WHERE (meta_value LIKE '%runner%')
AND meta_key IN ('keyword', 'filename', 'description')
GROUP BY image_id

2.

Then looping through the returned result set and get the metadata for each:

$search_results = $imagesearch->search_images(); //object array from above sql

foreach ($search_results as $image) {
    $id = $image->image_id;
    $get_metadata = $imageget->get_metadata($id)
}

3.

SELECT *
FROM metadata 
WHERE image_id = $id

With the metadata I am then extracting the rows where the meta_key equals the data I want (e.g. height ) to retrieve the value etc.

So my final array would look something like this:

Array
(
    [1] => array(
        [image_id] => 1
        [filename] => image_XYZ.jpg
        [thumb] => image_XYZ_thumb.jpg
        [filetype] => jpg
        [keyword] => runner
        [height] => 600
    ),
    [2] => array(
        [image_id] => 3
        [filename] => runner_bean.jpg
        [thumb] => runner_bean_thumb.jpg
        [filetype] => jpg
        [filesize] => 12485
        [description] => runnerbean
        [keyword] => runner
        [keyword] => vegetable
    ),
)

My effort so far was to use a subquery to first get the images that matchup the search term, and then get the metadata for those images:

SELECT *
FROM metadata m
LEFT JOIN image i ON i.image_id = m.image_id
WHERE i.image_id IN (
    SELECT image_id
    FROM metadata
    WHERE $search
    AND meta_key IN ('keyword', 'filename', 'description')
    GROUP BY image_id
)
GROUP BY m.image_id

My group by is collapsing the results by the image id so I only get the first row for each image (filetype in above example). Is there a way to get a final array of images that match the search term and return the metadata for each in an more optimised way?

Any help would be appreciated

Upvotes: 0

Views: 50

Answers (1)

GMB
GMB

Reputation: 222482

I cannot see how the second GROUP BY in your query makes sense, I guess that you need to remove it. The query will then generate one row for each image metadata, sorted by image. This is not exactly the structure you expect (this would require to pivot the resultset), but you can loop over it and generated you array of arrays on the fly (everytime the image_id changes, start feeding a new sub-array in your global array).

SELECT m.*
FROM metadata m
WHERE m.image_id IN (
    SELECT DISTINCT image_id
    FROM metadata
    WHERE 
        meta_value LIKE '%runner%'
        AND meta_key IN ('keyword', 'filename', 'description')
)
ORDER BY m.image_id

NB : depending on how your data look like, SELECT DISTINCT in the inner query might be more efficient than SELECT.

Another option (that produces the same resultset) is to use a correlated subquery with a WHERE EXISTS clause, as follows :

SELECT m.*
FROM metadata m
WHERE EXISTS (
    SELECT 1
    FROM metadata
    WHERE 
        image_id = m.image_id
        AND meta_value LIKE '%runner%'
        AND meta_key IN ('keyword', 'filename', 'description')    
)
ORDER BY m.image_id

Upvotes: 1

Related Questions