Benji XVI
Benji XVI

Reputation: 2310

Instead of independently joining multiple tables, use separate queries?

I often find myself performing a couple of independent joins off a table. For instance, say we have the table collections, which has independent one-to-N relationships with both photos and songs, where N is from zero to many.

Now, say we want to get a collection, and both its (independently) associated photos and songs.

I would typically use something like this:

SELECT
    collections.collectionid as collectionid,
    photos.name as photo_name,
    songs.name  as song_name

FROM collections
    LEFT JOIN photos ON collections.collectionid = photos.collectionid
    LEFT JOIN songs  ON collections.collectionid = songs.collectionid

WHERE collections.collectionid = 14

Of course, left-joining one table to two other tables, if the first join results in M rows and the second in N rows, gives M * N rows. This would seem suboptimal in terms of database-traffic and performance.

+--------------+------------+-----------+
| collectionid | photo_name | song_name |
+--------------+------------+-----------+
| 14           | 'x'        | 'a'       | \
| 14           | 'x'        | 'b'       |  - Each photo is returned 3 times,
| 14           | 'x'        | 'c'       | /  because 3 songs are returned.
| 14           | 'y'        | 'a'       | \
| 14           | 'y'        | 'b'       | 
| 14           | 'y'        | 'c'       | /
+--------------+------------+-----------+

Alternatively, you can perform two selects: two separate queries, each joining collections to a different table, giving M + N rows:

SELECT
    collections.collectionid as collectionid
    song.name as song_name
FROM collections
    LEFT JOIN songs on collections.collectionid = songs.collectionid
WHERE collections.collectionid = 14

and:

SELECT
    collections.collectionid as collectionid
    photos.name as photo_name
FROM collections
    LEFT JOIN photos on collections.collectionid = photos.collectionid
WHERE collections.collectionid = 14

giving:

+--------------+------------+    +--------------+------------+
| collectionid | song_name  |    | collectionid | photo_name |
+--------------+------------+    +--------------+------------+
| 14           | 'a'        |    | 14           | 'x'        |
| 14           | 'b'        |    | 14           | 'y'        |
| 14           | 'c'        |    +--------------+------------+
+--------------+------------+

My question: What is the best way to handle this?

Neither of the above seems optimal. So, is there another way that results in M + N rows, yet can be done in a single query?

Upvotes: 5

Views: 5924

Answers (2)

Larry Lustig
Larry Lustig

Reputation: 50970

Your first option (two independent JOINs) doesn't seem to provide you with a very useful result set (because the two subsidiary tables produce a semi-cartesian product and you have to de-duplicate the results in your application code).

The second option (two separate queries) is okay, unless you want to treat the results of the two queries as a single set for presentation purposes (for instance, sort them all together by a date field).

The best solution, I think, is to combine the two queries into one with UNION ALL, producing a single result set with only the rows you actually want:

SELECT
  collections.collectionid as collectionid,
  photos.name as photo_name,
  'photo' as document_type
FROM collections
  LEFT JOIN photos on collections.collectionid = photos.collectionid
WHERE collections.collectionid = 14
UNION ALL
SELECT
  collections.collectionid as collectionid,
  song.name as photo_name
  'song' as document_type
FROM collections
  LEFT JOIN songs on collections.collectionid = songs.collectionid
WHERE collections.collectionid = 14

This kind of result set can be ORDERed BY any field across the entire, combined set of records, allowing (for instance) to get the 20 most recent documents attached to the collection regardless of what type they are.

Upvotes: 5

Brent Arias
Brent Arias

Reputation: 30155

It does seem that the relationship between photos and permissions is undefined, which results in the cross-join you speak of. Yes, at face value, doing two queries is better than what you have. However, the real question is why do photos and permissions have no key-based relationship?

But perhaps I'm not understanding your overall schema. Perhaps all the permissions pertain to a single user. If yes, then I'd consider placing all the permissions on a single row (of several columns or in an XML blob), rather than in several rows. Doing so would permit a single query to fetch all the values without resulting in an unintentional cross-join.

Upvotes: 0

Related Questions