Sahil Malhotra
Sahil Malhotra

Reputation: 45

Why no similar ids in the results set when query with a correlated query inside where clause

I have a table with columns id, forename, surname, created (date).

I have a table such as the following:

ID | Forename | Surname | Created
---------------------------------
 1 | Tom      | Smith   | 2008-01-01
 1 | Tom      | Windsor | 2008-02-01
 2 | Anne     | Thorn   | 2008-01-05
 2 | Anne     | Baker   | 2008-03-01
 3 | Bill     | Sykes   | 2008-01-20

Basically, I want this to return the most recent name for each ID, so it would return:

ID | Forename | Surname | Created
---------------------------------
 1 | Tom      | Windsor | 2008-02-01
 2 | Anne     | Baker   | 2008-03-01
 3 | Bill     | Sykes   | 2008-01-20

I get the desired result with this query.

SELECT id, forename, surname, created
FROM name n
WHERE created = (SELECT MAX(created)
                              FROM name
                              GROUP BY id
                              HAVING id = n.id);

I am getting the result I want but I fail to understand WHY THE IDS ARE NOT BEING REPEATED in the result set. What I understand about correlated subquery is it takes one row from the outer query table and run the inner subquery. Shouldn't it repeat "id" when ids repeat in the outer query? Can someone explain to me what exactly is happening behind the scenes?

Upvotes: 2

Views: 42

Answers (2)

Barbaros Özhan
Barbaros Özhan

Reputation: 65228

You can also consider joining the tables by created vs max(created) column values :

SELECT n.id, n.forename, n.surname, n.created
  FROM name n
  RIGHT JOIN ( SELECT id, MAX(created) as created FROM name GROUP BY id ) t
    ON n.created = t.created;

or using IN operator :

SELECT id, forename, surname, created
  FROM name n
 WHERE ( id, created ) IN (SELECT id, MAX(created)
                             FROM name
                            GROUP BY id );

or using EXISTS with HAVING clause in the subquery :

SELECT id, forename, surname, created
  FROM name n
 WHERE EXISTS (SELECT id
                 FROM name
                GROUP BY id
                HAVING MAX(created) = n.created 
                );

Demo

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269753

First, your subquery does not need a GROUP BY. It is more commonly written as:

SELECT n.id, n.forename, n.surname, n.created
FROM name n
WHERE n.created = (SELECT MAX(n2.created)
                   FROM name n2
                   WHERE n2.id = n.id
                  );

You should get in the habit of qualifying all column references, especially when your query has multiple table references.

I think you are asking why this works. Well, each row in the outer query is tested for the condition. The condition is: "is my created the same as the maximum created for all rows in the name table with the same id". In your data, only one row per id matches that condition, so ids are not repeated.

Upvotes: 2

Related Questions