Fei
Fei

Reputation: 1061

BigQuery: join 2 tables but only choosing rows based on date column

I have 2 tables

the first table table_new_data is like

date    type  data
2022-01 t1    0
2022-03 t2    1
2021-08 t1    1

the second table table_old_data is like

date    type  data
2021-10 t1    2
2022-04 t2    3
2021-07 t1    4
2021-06 t1    5

I'd like a sql code snippet that table_new_data LEFT JOIN table_old_data and produce the following result.

new_date type new_data old_date  old_data
2022-01  t1   0        2021-10   2
2022-03  t2   1        null      null
2021-08  t1   1        2021-07   4

Please note that,

date is in YYYY-MM.

Upvotes: 1

Views: 817

Answers (1)

Yev Guyduy
Yev Guyduy

Reputation: 1549

See below... Think about edge cases.

--  ---------------------------------------------------------------------------------
--  create dummy NEW table 
--  ---------------------------------------------------------------------------------
WITH
  table_new_data AS (
  SELECT
    '2022-01' AS date,
    't1' AS type,
    0 AS DATA
  UNION ALL
  SELECT
    '2022-03',
    't2' ,
    1 
  UNION ALL
  SELECT
    '2021-08' ,
    't1' ,
    1  ),
--  ---------------------------------------------------------------------------------
--  create dummy OLD table 
--  ---------------------------------------------------------------------------------
  table_old_data AS (
  SELECT
    '2021-10' AS date,
    't1' AS type,
    2 AS DATA
  UNION ALL
  SELECT
    '2022-04',
    't2',
    3 
  UNION ALL
  SELECT
    '2021-07',
    't1',
    4
  UNION ALL
  SELECT
    '2021-06',
    't1',
    5),
--  ---------------------------------------------------------------------------------
--  create joined tables based on dates from old table being LOWER (may need <=??)
--  create order = ROW_NUMBER() function to see which date is closest from old table
--  make sure to test on edge cases where dates are the same or equal to
--  ---------------------------------------------------------------------------------
  ordered AS (
  SELECT
    nd.date AS new_date,
    nd.type,
    nd.DATA AS new_data,
    od.date AS old_date,
    od.DATA AS old_data,
    ROW_NUMBER() OVER(PARTITION BY nd.type, nd.date ORDER BY nd.date ) AS rn
  FROM
    table_new_data nd
  LEFT JOIN
    table_old_data od
  ON
    nd.type = od.type
    AND od.date < nd.date )
--  ---------------------------------------------------------------------------------
--  final table to reproduce desired output in question
--  ---------------------------------------------------------------------------------
SELECT
  * EXCEPT(rn)
FROM
  ordered
WHERE
  rn = 1

Output:

enter image description here

Upvotes: 2

Related Questions