Reputation: 127
Let's suppose, I have the following tables:
projects
versions
owners
I would like to retrieve a set of projects, and I would like to join only the newest version for each project. A DISTINCT query isn't good for this purpose, because it limits the colums returned from the query, however I need all the columns.
This is what I came up with:
SELECT
projects.id, projects.name, projects.owner_id
FROM
projects
INNER JOIN
(SELECT versions.id, versions.project_id, versions.created_at, versions.tag FROM versions ORDER BY versions.created_at DESC LIMIT 1)
ON
projects.id = versions.project_id
WHERE
projects.owner_id = 1
More generally, I would like to return a limited number of joined results based on a specific criteria.
But I'm wondering, is there a better solution for this? A nested query doesn't seem to be elegant, and probably isn't efficient either. Primarily I'm looking for an SQLite compatible solution, but if there are better ways for other SQL engines, I'm interested in it too, as I already bumped into to this problem in the past with MySQL for example.
UPDATE: I would like to give a further illustration of the problem by a wrong solution. The simplest way would be to do this:
SELECT
projects.id, projects.name, versions.tag, versions.created_at
FROM
projects
INNER JOIN
versions
ON
projects.id = versions.project_id
WHERE
projects.owner_id = 1
ORDER BY
versions.created_at DESC;
This is just a simple JOIN structure. Let's suppose, it will give the following dataset as a result:
# id name tag created_at
1 1 project1 v2.3 2018-05-08 18:33:42
2 5 project5 v4.0 2018-05-08 11:19:07
3 3 project3 v1.8 2018-05-07 21:41:49
4 5 project5 v3.18 2018-05-07 07:00:26
5 8 project8 v3.12 2018-05-06 08:59:01
6 11 project11 v1.9 2018-05-05 14:27:31
7 5 project5 v3.17 2018-05-04 11:48:22
8 1 project1 v2.2 2018-05-04 05:11:46
9 2 project2 v5.5 2018-05-03 23:08:57
10 7 project7 v6.8 2018-05-03 12:17:33
The solution is wrong, because it returns unnecessary rows too. What I would need from this dataset, is row #1, 2, 3, 5, 6, 9, 10. Because it is ordered by versions.created_at field in descending order, and I need the first result, i.e. the most recent of each project.
This is why I would need some sort of "selective LIMIT", which takes the versions.project_id field into account, similarly as a SELECT DISTINCT query would do, however, I need to return nearly all the fields from the table, not only the versions.project_id.
I am wondering, is there a "well-known" solution to this problem, which I don't know about?
Upvotes: 0
Views: 54
Reputation: 633
Using ROW_NUMBER with partition on id of projects and order of created_at like this:
SELECT *
FROM
(SELECT
*, ROW_NUMBER() OVER(PARTITION BY project_id ORDER BY created_at DESC) AS row_no
FROM
projects
INNER JOIN
versions ON projects.id = versions.project_id) AS tbl
WHERE row_no = 1
Upvotes: 1