Reputation: 55
I have developed an image searching tool to index a directory of all images, store the data and then retrieve the results based on a search.
This is working but due to the masses of data (approx 2.7 million rows of metadata for 150,000 images), it is taking a long time to return results (sometime >5 mins). I believe this is due to querying the second set of data in a foreach loop. I am looking to combine my sql into a single statement to better optimise (unless there is a another way!).
So far I am returning all the image_id's whereby the metadata matches the search term, and then using the returned image_id's to get all the metadata in a metadata table for each image. Here is the basic structure of the db:
image_id filename thumb
============= ============ =================
1 image_XYZ.jpg image_XYZ_thumb.jpg
2 emoticon.png emoticon_thumb.jpg
3 runner_bean.jpg runner_bean_thumb.jpg
meta_id meta_key meta_value image_id
======= ======== ========== ========
1 filetype jpg 1
2 keyword runner 1
3 height 600 1
4 filetype png 2
5 filesize 5198413 2
6 description smiley face 2
7 filetype jpg 3
8 filesize 12485 3
9 description runnerbean 3
10 keyword runner 3
11 keyword vegetable 3
So far my code starts by getting all the images where the search term "runner" matches (So images 1 and 3)
SELECT image_id
FROM metadata
WHERE (meta_value LIKE '%runner%')
AND meta_key IN ('keyword', 'filename', 'description')
GROUP BY image_id
Then looping through the returned result set and get the metadata for each:
$search_results = $imagesearch->search_images(); //object array from above sql
foreach ($search_results as $image) {
$id = $image->image_id;
$get_metadata = $imageget->get_metadata($id)
}
SELECT *
FROM metadata
WHERE image_id = $id
With the metadata I am then extracting the rows where the meta_key equals the data I want (e.g. height ) to retrieve the value etc.
So my final array would look something like this:
Array
(
[1] => array(
[image_id] => 1
[filename] => image_XYZ.jpg
[thumb] => image_XYZ_thumb.jpg
[filetype] => jpg
[keyword] => runner
[height] => 600
),
[2] => array(
[image_id] => 3
[filename] => runner_bean.jpg
[thumb] => runner_bean_thumb.jpg
[filetype] => jpg
[filesize] => 12485
[description] => runnerbean
[keyword] => runner
[keyword] => vegetable
),
)
My effort so far was to use a subquery to first get the images that matchup the search term, and then get the metadata for those images:
SELECT *
FROM metadata m
LEFT JOIN image i ON i.image_id = m.image_id
WHERE i.image_id IN (
SELECT image_id
FROM metadata
WHERE $search
AND meta_key IN ('keyword', 'filename', 'description')
GROUP BY image_id
)
GROUP BY m.image_id
My group by is collapsing the results by the image id so I only get the first row for each image (filetype in above example). Is there a way to get a final array of images that match the search term and return the metadata for each in an more optimised way?
Any help would be appreciated
Upvotes: 0
Views: 50
Reputation: 222482
I cannot see how the second GROUP BY
in your query makes sense, I guess that you need to remove it. The query will then generate one row for each image metadata, sorted by image. This is not exactly the structure you expect (this would require to pivot the resultset), but you can loop over it and generated you array of arrays on the fly (everytime the image_id
changes, start feeding a new sub-array in your global array).
SELECT m.*
FROM metadata m
WHERE m.image_id IN (
SELECT DISTINCT image_id
FROM metadata
WHERE
meta_value LIKE '%runner%'
AND meta_key IN ('keyword', 'filename', 'description')
)
ORDER BY m.image_id
NB : depending on how your data look like, SELECT DISTINCT
in the inner query might be more efficient than SELECT
.
Another option (that produces the same resultset) is to use a correlated subquery with a WHERE EXISTS
clause, as follows :
SELECT m.*
FROM metadata m
WHERE EXISTS (
SELECT 1
FROM metadata
WHERE
image_id = m.image_id
AND meta_value LIKE '%runner%'
AND meta_key IN ('keyword', 'filename', 'description')
)
ORDER BY m.image_id
Upvotes: 1