Oskar Oskarsson
Oskar Oskarsson

Reputation: 1

Issue with DISTINCT in my mysql query

Hello i am trying to make a website where i want a photo album.

Thing is that this site will have 2 tables, 1 table that stores name of photo album (TBLalbum) and a unique id for that album (id,albumname). The second table (TBLphotos) stores information about uploaded photos and to what album each photo belongs(albumid,photoname and some more columns). Now i want to list all album names which is not so hard to do, but also in same sql query i want it to show the first photo in each album.

I tried this:

SELECT DISTINCT albumid 
FROM TBLphotos INNER JOIN TBLalbum 
ON TBLalbum.id=TBLphotos.albumid

this does not give me the possibility to print out the column photoname

and if i include it like :

SELECT DISTINCT albumid,photoname 
FROM TBLphotos INNER JOIN TBLalbum 
ON TBLalbum.id=TBLphotos.albumid

then it doesn't get distinct on just albumid.

A bit hard for me to explain, but can anyone tell me how to solve this?

Upvotes: 0

Views: 77

Answers (2)

Paul Griffin
Paul Griffin

Reputation: 2436

Try:

SELECT
  a.albumid,
  a.albumname,
  (SELECT photoname FROM TBLphotos p WHERE p.albumid = a.id LIMIT 1) AS photoname
FROM TBLalbum a

The subquery limits the result set for each album to just one record with the LIMIT clause. If you want a specific record from TBLphotos, you can amend the subquery to put things in a certain order or do whatever you need to do.

Note that DISTINCT is for filtering out duplicate rows, which is not (I think) what you are really trying to do here.

Upvotes: 1

axiac
axiac

Reputation: 72177

The correct query for your request is:

SELECT a.albumid, f.photoname 
FROM TBLalbum a
    INNER JOIN TBLphotos f          # "f" from "first" photo
      ON f.albumid = a.id           #          of this album
    LEFT JOIN TBLphotos o           # "o" from "other" photo
      ON o.albumid = f.albumid      #          of the same album
      AND o.photoid < f.photoid     # "o" is before "f"
WHERE o.photoid IS NULL             # no such "o" exists

Explanation

This query joins table TBLalbum (aliased as a from "album") with TBLphotos (aliased as f from "first photo of this album"). Because of the INNER JOIN, the albums without photos will not be included. Use LEFT JOIN instead if you want them included.

Next, it joins the table TBLalbum again (aliased as o from "other photo"). The first join condition (o.albumid = f.albumid) ensures the photos selected from o are in the same album as their matches from f. The second condition (o.photoid < f.photoid) pairs a row from f1 only with the rows from o having a smaller photoid (i.e. created earlier). Replace this with your own definition of earlier in order to get the first photo as expected.

Because the second JOIN is a LEFT JOIN, it will add to the result set all the rows from f1, including those that does not have any match in o because there is no photo in o that is earlier than the photo currently selected from f. For these rows, a row full of NULLs is used instead of the missing row from o.

The WHERE clause filters from the result set produced by the joins only the pairs (a, f, o) that have NULL values in o.photoid; i.e. when no "other photo" in the same album was found in o as being earlier than a certain photo from f. These are the first photos of each album you expect.

Notes

1 The LEFT JOIN combines the tables a and f already joined (left) with table o (right). The result set will contain all the combinations of rows produced by the INNER JOIN of a and f, at least once (and not all the rows from f as said above; I preferred to write it this way to make it simpler).

2 I assumed there is a field named photoid in table TBLphotos and it is the PK of the table (it's important to have distinct values). If you change the sorting criterion to define the first photo from the table (for example, using the upload time) and the field you use does not have distinct values then on tie the query returns all the photos having the same smallest value for the chosen column. You can fix this easily by keeping/adding into the ON clause the condition that uses photoid (or other field that could solve the tie).

3 You can put whatever fields you want/need into the SELECT clause but only from tables a and f. Nothing stops you to put fields from o but all of them are NULL. Read the entire explanation again if you didn't understand why.

4 If you want to understand how this query works then remove the WHERE clause, run it using a MySQL client and analyze the result set.

5 You'll find a lot of answers to similar questions on SO tagged . If they use GROUP BY avoid them because they are wrong and not conformant with the SQL standard (most of them work by accident). If they use sub-queries (with or without GROUP BY in the inner query) they are probably correct but most of the time they are slow. Sub-queries are difficult to optimize, and a lot of times it is not even possible. If they use MySQL variables then they are also slow. I don't know why MySQL variables are slow, I guess it's because the query planner cannot make the best use of indexes in this case.

Upvotes: 1

Related Questions