Ahsan
Ahsan

Reputation: 469

mysql returning duplicates on JOIN

i have two tables in a database. The table clients looks like this:

----------------------------
|id | name | age | gender  |
|---------------------------
|1  | CL1  | 22  |  M      |
|2  | CL2  | 23  |  M      |
|3  | CL3  | 24  |  M      |
|4  | CL4  | 25  |  F      |
|5  | CL5  | 26  |  NA     |
----------------------------

Now i have another table which relates to this client table , please note that the "id" in above Table is not AUTO_INCREMENT and is UNIQUE.

The second table is "images" which contain portfolio images of the clients and looks like this :

 ------------------------------
|id | client_id | url         |
|------------------------------
|1  | 1         | img1_1.jpg  | 
|2  | 1         | img1_2.jpg  | 
|3  | 1         | img1_3.jpg  | 
|4  | 2         | img2_1.jpg  | 
|5  | 2         | img2_2.jpg  |
-------------------------------

What i am basically achieving is that i want to pull all results from the client table which include name age gender etc and the first and one result from the images table which means, that if i Query it will have to show me the imag1_1.jpg from images table if i query for CL1 in clients table. For this i am doing something like this :

SELECT DISTINCT c.* , i.* FROM clients c LEFT JOIN images i ON i.client_id = c.id

This query returns me the results but then the results are more duplicates. I ain't getting or i am either confused for WHAT THE DISTINCT stands for then if it still returns the duplicates, or may be i am missing something.

Any help regarding would be appreciated.

Best,

Ahsan

Upvotes: 1

Views: 84

Answers (3)

spencer7593
spencer7593

Reputation: 108530

Here's one way to do it, using a correlated subquery:

SELECT c.*
     , ( SELECT i.url 
           FROM images i 
          WHERE i.client_id = c.id
          ORDER BY i.id
          LIMIT 1
       ) AS url 
  FROM clients c 

You don't really need to pull client_id from the images table, you already know it's value. If you need to return the id value from the images table, you'd need to add another correlated subquery in the select list

     , ( SELECT i.id
           FROM images i 
          WHERE i.client_id = c.id
          ORDER BY i.id
          LIMIT 1
       ) AS images_id

This approach can get expensive on large sets, but it performs reasonably for a limited number of rows returned from clients.

A more general query is of the form:

SELECT c.*
     , i.*
  FROM clients c
  LEFT
  JOIN ( SELECT m.client_id, MIN(m.id) as images_id
           FROM images m
          GROUP BY m.client_id
       ) n
  LEFT
  JOIN images i
    ON i.id = n.images_id

The inline view aliased as n will get a single id value from the images table for each client_id, and then we can use that id value to join back to the images table, to retrieve the entire row.

Performance of this form can be better, but with large sets, materializing the inline view aliased as n can take some time. If you have a predicate on the client.id table on the outer query, then for better performance, that predicate can be repeated on m.client_id inside the inline view as well, to limit the number of rows.

Upvotes: 2

Marc B
Marc B

Reputation: 360922

SELECT DISTINCT operates on a ROW basis. It checks all values in a row against all other rows. If even one value is different, then the row is not a duplicate and the whole thing will be output. If you want to force a single FIELD to be distinct, then you should GROUP BY that field instead.

Since you're doing a left join, you'll get all records from the clients table, and ANY matching records from the images table.

Upvotes: 1

eggyal
eggyal

Reputation: 126055

Assuming that by "first" you mean the record with the minimal images.id, you are after the groupwise minimum:

SELECT * FROM images NATURAL JOIN (
  SELECT   client_id, MIN(id) id
  FROM     images
  GROUP BY client_id
) t JOIN clients ON clients.id = images.client_id

Upvotes: 1

Related Questions