Reputation: 13358
I have data that looks like this:
entities
id name
1 Apple
2 Orange
3 Banana
Periodically, a process will run and give a score to each entity. The process generates the data and adds it to a scores table like so:
scores
id entity_id score date_added
1 1 10 1/2/09
2 2 10 1/2/09
3 1 15 1/3/09
4 2 10 1/03/09
5 1 15 1/4/09
6 2 15 1/4/09
7 3 22 1/4/09
I want to be able to select all of the entities along with the most recent recorded score for each resulting in some data like this:
entities
id name score date_added
1 Apple 15 1/4/09
2 Orange 15 1/4/09
3 Banana 15 1/4/09
I can get the data for a single entity using this query:
SELECT entities.*,
scores.score,
scores.date_added
FROM entities
INNER JOIN scores
ON entities.id = scores.entity_id
WHERE entities.id = ?
ORDER BY scores.date_added DESC
LIMIT 1
But I'm at a loss for how to select the same for all entities. Perhaps it's staring me in the face?
Thank you very kindly for taking the time.
Thanks for the great responses. I'll give it a few days to see if a preferred solution bubbles up then I'll select the answer.
UPDATE: I've tried out several of the proposed solutions, the main issue I'm facing now is that if an entity does not yet have a generated score they don't appear in the list.
What would the SQL look like to ensure that all entities are returned, even if they don't have any score posted yet?
UPDATE: Answer selected. Thanks everyone!
Upvotes: 51
Views: 40767
Reputation: 1507
You can also do this today in most RDBMSs (Oracle, PostgreSQL, SQL Server) with a natural query using window functions such as ROW_NUMBER:
SELECT id, name, score, date_added FROM (
SELECT e.id, e.name, s.score, s.date_added,
ROW_NUMBER() OVER (PARTITION BY e.id ORDER BY s.date_added DESC) rn
FROM Entities e INNER JOIN Scores s ON e.id = s.entity_id
) tmp WHERE rn = 1;
Upvotes: 1
Reputation: 4187
I know this is a old question, just thought I'd add a approach no-one has mentioned yet, Cross Apply
or Outer Apply
. These are available in SQL Server 2005 (the database type is not tagged in this question) Or higher
Using the temporary tables
DECLARE @Entities TABLE(Id INT PRIMARY KEY, name NVARCHAR(MAX))
INSERT INTO @Entities
VALUES (1, 'Apple'), (2, 'Orange'), (3, 'Banana'), (4, 'Cherry')
DECLARE @Scores TABLE(Id INT PRIMARY KEY, Entity_Id INT, Score INT, Date_Added DATE)
INSERT INTO @Scores
VALUES (1,1,10,'2009-02-01'),
(2,2,10,'2009-02-01'),
(3,1,15,'2009-02-01'),
(4,2,10,'2009-03-01'),
(5,1,15,'2009-04-01'),
(6,2,15,'2009-04-01'),
(7,3,22,'2009-04-01')
You could use
SELECT E.Id, E.name, S.Score, S.Date_Added
FROM @Entities E
CROSS APPLY
(
SELECT TOP 1 *
FROM @Scores Sc
WHERE Sc.Entity_Id = E.Id
ORDER BY sc.Score DESC
) AS S
to get the desired results. The equivilent to allow entities without scores would be
SELECT E.Id, E.name, S.Score, S.Date_Added
FROM @Entities E
OUTER APPLY
(
SELECT TOP 1 *
FROM @Scores Sc
WHERE Sc.Entity_Id = E.Id
ORDER BY sc.Score DESC
) AS S
Upvotes: 4
Reputation: 39393
approach 1
SELECT entities.*,
scores.score,
scores.date_added
FROM entities
INNER JOIN scores
ON entities.id = scores.entity_id
WHERE scores.date_added =
(SELECT max(date_added) FROM scores where entity_id = entities.id)
Upvotes: 5
Reputation: 39393
approach 2
query cost relative to batch:
approach 1: 22% -- faster
approach 2: 22% -- faster
approach 3: 27%
approach 4: 27%
SELECT entities.*,
scores.score,
scores.date_added
FROM entities
INNER JOIN scores
ON entities.id = scores.entity_id
inner join
(
SELECT
entity_id, max(date_added) as recent_date
FROM scores
group by entity_id
) as y on entities.id = y.entity_id and scores.date_added = y.recent_date
Upvotes: 3
Reputation: 16249
Just to add my variation on it:
SELECT e.*, s1.score
FROM entities e
INNER JOIN score s1 ON e.id = s1.entity_id
WHERE NOT EXISTS (
SELECT 1 FROM score s2 WHERE s2.id > s1.id
)
Upvotes: 10
Reputation: 74280
SELECT entities.*,
scores.score,
scores.date_added
FROM entities
INNER JOIN scores
ON entities.id = scores.entity_id
WHERE entities.id in
(select id from scores s2 where date_added = max(date_added) and s2.id = entities.id)
ORDER BY scores.date_added DESC
LIMIT 1
Upvotes: 1
Reputation: 562378
I do it this way:
SELECT e.*, s1.score, s1.date_added
FROM entities e
INNER JOIN scores s1
ON (e.id = s1.entity_id)
LEFT OUTER JOIN scores s2
ON (e.id = s2.entity_id AND s1.id < s2.id)
WHERE s2.id IS NULL;
Upvotes: 63