Books Books
Books Books

Reputation: 29

window function to find max date sql bigquery

In code below I am trying to loop over the values in column edited_date while comparing it to pageview_date column's value. The goal is to grab MAX value from edited_date that is no later than the row's value of pageview_date.

with sample as (
  select 'a' as id, DATE('2022-02-27') as pageview_date, DATE('2022-01-28') as edited_date
  UNION ALL
  select 'a' as id, DATE('2022-02-27') as pageview_date, DATE('2022-03-01') as edited_date
  UNION ALL
  select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-03-28') as edited_date
  UNION ALL
  select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-01-28') as edited_date
  UNION ALL
  select 'a' as id, DATE('2022-03-05') as pageview_date, DATE('2017-02-28') as edited_date
)
SELECT
  id,
  pageview_date,
  MAX(IF(edited_date <= pageview_date, edited_date, null)) OVER (ORDER BY pageview_date RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as new_edited_date
FROM sample

Returns: enter image description here

Desired output:

id          pageview_date         new_edited_date
a           2022-02-27             2022-01-28
a           2022-02-27             2022-01-28
a           2022-03-01             2022-03-01
a           2022-03-01             2022-03-01
a           2022-03-05             2022-03-01

Upvotes: 0

Views: 1061

Answers (1)

Dami&#227;o Martins
Dami&#227;o Martins

Reputation: 1849

An approach using subselect instead of window functions:

with sample as (
  select 'a' as id, DATE('2022-02-27') as pageview_date, DATE('2022-01-28') as edited_date
  UNION ALL
  select 'a' as id, DATE('2022-02-27') as pageview_date, DATE('2022-03-01') as edited_date
  UNION ALL
  select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-03-28') as edited_date
  UNION ALL
  select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-01-28') as edited_date
  UNION ALL
  select 'a' as id, DATE('2022-03-05') as pageview_date, DATE('2017-02-28') as edited_date
)
SELECT
  s1.id,
  s1.pageview_date,
  (SELECT MAX(edited_date) FROM sample WHERE id = s1.id AND edited_date <= s1.pageview_date) as new_edited_date
FROM sample s1

Output:

id  pageview_date   new_edited_date
a   2022-02-27      2022-01-28
a   2022-02-27      2022-01-28
a   2022-03-01      2022-03-01
a   2022-03-01      2022-03-01
a   2022-03-05      2022-03-01

Upvotes: 2

Related Questions