Reputation: 17082
Since Postgres came out with the ability to do LATERAL
joins, I've been reading up on it, since I currently do complex data dumps for my team with lots of inefficient subqueries that make the overall query take four minutes or more.
I understand that LATERAL
joins may be able to help me, but even after reading articles like this one from Heap Analytics, I still don't quite follow.
What is the use case for a LATERAL
join? What is the difference between a LATERAL
join and a subquery?
Upvotes: 351
Views: 393193
Reputation: 658997
LATERAL
join?The feature was introduced with PostgreSQL 9.3. The manual:
Subqueries appearing in
FROM
can be preceded by the key wordLATERAL
. This allows them to reference columns provided by precedingFROM
items. (WithoutLATERAL
, each subquery is evaluated independently and so cannot cross-reference any otherFROM
item.)Table functions appearing in
FROM
can also be preceded by the key wordLATERAL
, but for functions the key word is optional; the function's arguments can contain references to columns provided by precedingFROM
items in any case.
Basic code examples are given there.
A LATERAL
join is more like a correlated subquery, not a plain subquery, in that expressions to the right of a LATERAL
join are evaluated once for each row left of it - just like a correlated subquery - while a plain subquery (table expression) is evaluated once only. (The query planner has ways to optimize performance for either, though.)
Related answer with code examples for both side by side, solving the same problem:
For returning more than one column, a LATERAL
join is typically simpler, cleaner and faster.
Also, remember that the equivalent of a correlated subquery is LEFT JOIN LATERAL ... ON true
:
There are things that a LATERAL
join can do, but a (correlated) subquery cannot (easily). A correlated subquery can only return a single value, not multiple columns and not multiple rows - with the exception of bare function calls (which multiply result rows if they return multiple rows). But even certain set‑returning functions are only allowed in the FROM
clause. Like unnest()
with multiple parameters in Postgres 9.4 or later. The manual:
This is only allowed in the
FROM
clause;
So this works, but cannot (easily) be replaced with a subquery:
CREATE TABLE tbl (a1 int[], a2 int[]);
SELECT * FROM tbl, unnest(a1, a2) u(elem1, elem2); -- implicit LATERAL
The comma (,
) in the FROM
clause is short notation for CROSS JOIN
.
LATERAL
is assumed automatically for table functions.
About the special case of UNNEST( array_expression [, ... ] )
:
SELECT
listYou can also use set-returning functions like unnest()
in the SELECT
list directly. This used to exhibit surprising behavior with more than one such function in the same SELECT
list up to Postgres 9.6. But it has finally been sanitized with Postgres 10 and is a valid alternative now (even if not standard SQL). See:
Building on above example:
SELECT *, unnest(a1) AS elem1, unnest(a2) AS elem2
FROM tbl;
Comparison:
fiddle for pg 9.6
fiddle for pg 10
To note: a (combination of) set-returning function(s) in the SELECT
list that produces no rows eliminates the row. Internally it translates to CROSS JOIN LATERAL ROWS FROM ...
, not to LEFT JOIN LATERAL ... ON true
!
fiddle for pg 16 demonstrating the difference.
For the
INNER
andOUTER
join types, a join condition must be specified, namely exactly one ofNATURAL
,ON
join_condition, orUSING
(join_column [, ...]). See below for the meaning.
ForCROSS JOIN
, none of these clauses can appear.
So these two queries are valid (even if not particularly useful):
SELECT *
FROM tbl t
LEFT JOIN LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t ON true;
SELECT *
FROM tbl t, LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t;
While this one is not:
SELECT *
FROM tbl t
LEFT JOIN LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t;
That's why Andomar's code example is correct (the CROSS JOIN
does not require a join condition) and Attila's is was not.
Upvotes: 400
Reputation: 154130
Having the following blog
database table storing the blogs hosted by our platform:
And, we have two blogs currently hosted:
id | created_on | title | url |
---|---|---|---|
1 | 2013-09-30 | Vlad Mihalcea's Blog | https://vladmihalcea.com |
2 | 2017-01-22 | Hypersistence | https://hypersistence.io |
We need to build a report that extracts the following data from the blog
table:
If you're using PostgreSQL, then you have to execute the following SQL query:
SELECT
b.id as blog_id,
extract(
YEAR FROM age(now(), b.created_on)
) AS age_in_years,
date(
created_on + (
extract(YEAR FROM age(now(), b.created_on)) + 1
) * interval '1 year'
) AS next_anniversary,
date(
created_on + (
extract(YEAR FROM age(now(), b.created_on)) + 1
) * interval '1 year'
) - date(now()) AS days_to_next_anniversary
FROM blog b
ORDER BY blog_id
As you can see, the age_in_years
has to be defined three times because you need it when calculating the next_anniversary
and days_to_next_anniversary
values.
And, that's exactly where LATERAL JOIN can help us.
The following relational database systems support the LATERAL JOIN
syntax:
SQL Server can emulate the LATERAL JOIN
using CROSS APPLY
and OUTER APPLY
.
LATERAL JOIN allows us to reuse the age_in_years
value and just pass it further when calculating the next_anniversary
and days_to_next_anniversary
values.
The previous query can be rewritten to use the LATERAL JOIN, as follows:
SELECT
b.id as blog_id,
age_in_years,
date(
created_on + (age_in_years + 1) * interval '1 year'
) AS next_anniversary,
date(
created_on + (age_in_years + 1) * interval '1 year'
) - date(now()) AS days_to_next_anniversary
FROM blog b
CROSS JOIN LATERAL (
SELECT
cast(
extract(YEAR FROM age(now(), b.created_on)) AS int
) AS age_in_years
) AS t
ORDER BY blog_id
And, the age_in_years
value can be calculated one and reused for the next_anniversary
and days_to_next_anniversary
computations:
blog_id | age_in_years | next_anniversary | days_to_next_anniversary |
---|---|---|---|
1 | 7 | 2021-09-30 | 295 |
2 | 3 | 2021-01-22 | 44 |
Much better, right?
The age_in_years
is calculated for every record of the blog
table. So, it works like a correlated subquery, but the subquery records are joined with the primary table and, for this reason, we can reference the columns produced by the subquery.
Upvotes: 58
Reputation: 23259
One thing no one has pointed out is that you can use LATERAL
queries to apply a user-defined function on every selected row.
For instance:
CREATE OR REPLACE FUNCTION delete_company(companyId varchar(255))
RETURNS void AS $$
BEGIN
DELETE FROM company_settings WHERE "company_id"=company_id;
DELETE FROM users WHERE "company_id"=companyId;
DELETE FROM companies WHERE id=companyId;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM (
SELECT id, name, created_at FROM companies WHERE created_at < '2018-01-01'
) c, LATERAL delete_company(c.id);
That's the only way I know how to do this sort of thing in PostgreSQL.
Upvotes: 27
Reputation: 14721
First, Lateral and Cross Apply is same thing. Therefore you may also read about Cross Apply. Since it was implemented in SQL Server for ages, you will find more information about it then Lateral.
Second, according to my understanding, there is nothing you can not do using subquery instead of using lateral. But:
Consider following query.
Select A.*
, (Select B.Column1 from B where B.Fk1 = A.PK and Limit 1)
, (Select B.Column2 from B where B.Fk1 = A.PK and Limit 1)
FROM A
You can use lateral in this condition.
Select A.*
, x.Column1
, x.Column2
FROM A LEFT JOIN LATERAL (
Select B.Column1,B.Column2,B.Fk1 from B Limit 1
) x ON X.Fk1 = A.PK
In this query you can not use normal join, due to limit clause. Lateral or Cross Apply can be used when there is not simple join condition.
There are more usages for lateral or cross apply but this is most common one I found.
Upvotes: 23
Reputation: 238296
The difference between a non-lateral
and a lateral
join lies in whether you can look to the left hand table's row. For example:
select *
from table1 t1
cross join lateral
(
select *
from t2
where t1.col1 = t2.col1 -- Only allowed because of lateral
) sub
This "outward looking" means that the subquery has to be evaluated more than once. After all, t1.col1
can assume many values.
By contrast, the subquery after a non-lateral
join can be evaluated once:
select *
from table1 t1
cross join
(
select *
from t2
where t2.col1 = 42 -- No reference to outer query
) sub
As is required without lateral
, the inner query does not depend in any way on the outer query. A lateral
query is an example of a correlated
query, because of its relation with rows outside the query itself.
Upvotes: 150