Martin Burch
Martin Burch

Reputation: 2931

Is there a way to do LEFT JOIN LATERAL with BigQuery?

Given some rows with duplicate names and different timestamps, I would like to select the row with the newest timestamp, if the duplicate name occurs within say, 45 minutes, of the first timestamp.

Here's what worked in PostgreSQL:

SELECT i.ts AS base_timestamp, j.ts AS newer_timestamp, i.name
FROM tbl i
LEFT JOIN LATERAL
(SELECT j.ts
 FROM tbl j
 WHERE i.name = j.name
 AND j.ts > i.ts
 AND j.ts < (i.ts + INTERVAL '45 minutes')
) j ON TRUE
WHERE j.ts is NULL

Great explanation of LATERAL here: https://heap.io/blog/engineering/postgresqls-powerful-new-join-type-lateral

LATERAL join is like a SQL foreach loop, in which PostgreSQL will iterate over each row in a result set and evaluate a subquery using that row as a parameter.

So it's like a correlated subquery, but in the join.

Then I simply take only the rows where there is no newer timestamp (WHERE j.ts is NULL).

How can I do this in BigQuery?

EDIT: I've created an example of the PostgreSQL grouping on SQLFiddle as requested in the comments.

Input:

  ('Duplication Example','2019-06-22 19:10:25'),
  ('Duplication Example','2019-06-22 23:58:31'), 
  ('Duplication Example','2019-06-23 00:08:00')

Output (middle row having timestamp 23:58:31 removed):

base_timestamp          newer_timestamp name
2019-06-22T19:10:25Z    (null)          Duplication Example
2019-06-23T00:08:00Z    (null)          Duplication Example

Upvotes: 3

Views: 6984

Answers (2)

DinushaNT
DinushaNT

Reputation: 1147

This can be archived with a WINDOW function.

SELECT
  name,
  MAX(timestamp) AS timestamp_new
FROM
(
  SELECT 
    i.name,
    COUNT(*) OVER (PARTITION BY i.name ORDER BY i.ts RANGE BETWEEN 45 * 60 * 1000 PRECEDING AND CURRENT ROW) as 45_window_count,
    i.ts AS timestamp
  FROM 
    tbl i
)
WHERE 45_window_count > 1
GROUP BY user

Upvotes: 1

Martin Weitzmann
Martin Weitzmann

Reputation: 4746

Your case looks kind of like a task for window functions. But since you seem to be interested in lateral joins more than in solving the problem you presented: In BigQuery there is afaik only an implicit version of lateral joins: when joining with unnested arrays.

This showcases the idea:

WITH t AS (
  SELECT 'a' as id, [2,3] as arr 
  UNION ALL SELECT 'b', [56, 7]
)

SELECT * EXCEPT(arr) 
FROM t LEFT JOIN UNNEST(arr)

Upvotes: 1

Related Questions