Jerry
Jerry

Reputation: 3608

Optimizing multiple JOINs in MySQL

I am dynamically building a SELECT statement that fetches an item and potentially several lists of related items.

The ultimate goal is an object in application space with arrays of ids for each of the related types.

Using a list of JOINs is pretty straightforward:

SELECT items.*, item_has_related1.related1_id, item_has_related2.related2_id, ...
FROM (items)
LEFT JOIN item_has_related1 ON item_has_related1.item_id = items.id
LEFT JOIN item_has_related2 ON item_has_related2.item_id = items.id
... potentially many more
WHERE items.id = $itemId;

LEFT JOIN is used because some relationships might be empty.

The most obvious problem with this is that the number of rows that are returned is the product of the number matches in all the joins. With just a few joined tables that number could get very large. If there were five tables with six matches each, there would be 6^5 rows! A secondary problem is that processing the return rows is more complex, as I have to dig out the unique values in each column.

As an alternative, I have written something like this, which essentially does a separate query for each JOIN:

SELECT items.*, item_has_related_1.related1_id, NULL as related2_id, ...
FROM (items)
JOIN item_has_related_1 ON item_has_related_1.item_id = items.id
WHERE items.id = $itemId

UNION

SELECT items.*, NULL as related1_id, item_has_related_2.related2_id, ...
FROM (items)
JOIN item_has_related_2 ON item_has_related_2.item_id = items.id
WHERE items.id = $itemId

The number of rows returned this way is the sum of the number of matches in all joins. However, query prep time is much longer and so for smaller datasets this method is less efficient. I have tried to empirically determine the definition of "smaller", but with my test data I'm not sure if my results are meaningful.

Is there a more efficient way to perform multiple JOINs and combine the results, or is there another approach to this problem?

EDITED TO ADD: Barmar has the right answer to my question, but my very next step was expanding the where clause to return multiple rows. Referring to this question, my code ended up looking like this:

SELECT items.*,
(SELECT GROUP_CONCAT(related1_id) FROM item_has_related_1 WHERE item_id = items.id) as related1Ids,
(SELECT GROUP_CONCAT(related2_id) FROM item_has_related_2 WHERE item_id = items.id) as related2Ids,
...
FROM items
WHERE <where criteria>

Upvotes: 0

Views: 222

Answers (2)

Barmar
Barmar

Reputation: 782693

You can use GROUP_CONCAT to get all the related items from each table into a comma-separated list in the result.

SELECT items.*, related1_ids, related2_ids, ...
FROM items
LEFT JOIN (
    SELECT item_id, GROUP_CONCAT(related1_id) AS related1_ids
    FROM item_has_related_1
    WHERE item_id = $itemId
) AS r1 ON items.id = r1.item_id
LEFT JOIN (
    SELECT item_id, GROUP_CONCAT(related2_id) AS related2_ids
    FROM item_has_related_2
    WHERE item_id = $itemId
) AS r2 ON items.id = r2.item_id
...

Later you can split them up in the application language.

Upvotes: 1

NiVeR
NiVeR

Reputation: 9806

You can simply write the query with inner joins like this:

SELECT items.*, item_has_related1.related1_id, item_has_related2.related2_id, ...
FROM (items)
INNER JOIN item_has_related1 ON item_has_related1.item_id = items.id
INNER JOIN item_has_related2 ON item_has_related2.item_id = items.id
... potentially many more
WHERE items.id = $itemId;

This query will have as many rows as there are matches of $itemId in the other tables.

The thing is, if you will ever need all those data listed in the select statement, you will have to do the work of joining all queries, even if they are separate, which will not gain anything w.r.t the approach of doing all joins together as listed here.

Upvotes: 0

Related Questions