Retrieving most recent date from series of dates in a single record / row

Question

Pretty simple schema of a USER_ID and a bunch of dates that correspond to "stages" of the contract cycle. Problem is that I need the most recent date and/or the data element with the most recent date.

Given a schema of:

USER  STRING
date1 Timestamp
date2 Timestamp
date3 Timestamp
date4 Timestamp
date5 Timestamp


USER   date1       date2       date3       date4       date5 
Fred   01/01/1980  02/02/1980  03/15/1980  04/15/1980  05/15/1980
Jane   02/01/1980  03/01/1980  02/15/1980  02/01/1980  02/01/1980

The following code is clunky and assumes that stages are all done in order which isn't always the case.

CASE WHEN date5 is NOT NULL THEN DATE(date5)
     WHEN date4 is NOT NULL THEN DATE(date4)
     WHEN date3 is NOT NULL THEN DATE(date3)
     WHEN date2 is NOT NULL THEN DATE(date2)
ELSE date1 END AS Last_Date,

First Ask: Get most recent date from list

What I'd love to be able to do is a simple MAX(date1, date2, date3, date4, date5) which would give me the most recent date out of the list.

SELECT USERID, MAX(date1, date2, date3, date4, date5)

In the above example is would return:

USERID | Last_Date
Fred   | 05/15/1980
Jane   | 03/01/1980

Second Request: Get most recent data element

Also very useful is to retrieve the actual data element that is the most recent returning "date2" if it is the most recent.

So again:

SELECT USERID, MAX_ELEMENT(date1,date2,date3,date4,date5) AS LATEST_ELEMENT

would retrieve:

USERID | LATEST_ELEMENT
Fred   | date5
Jane   | date2

Seems like there should be a better / more elegant or performant method.

Final Tuning to Account for Nulls and Timestamps (as opposed to dates)

Mikhail Berlyant has a great solution but I've discovered that there are a lot of stages with the same DATE but close TIMESTAMPS as follows:

DATE FORMATS

Chooses the first DATE so if Stage1 and Stage2 were two hours apart on the same day then Stage1 is considered the last which is incorrect here.

Id     date1       date2        last_col  last_date 
FRED   2020-09-14  2020-09-14   date1     2020-09-14
JANE   2019-04-13  2019-04-13   date1     2019-04-13

Using timestamps also gets some incorrect results due to the time not being accounted for:

Id   date1                   date2                    last_col  last_date   
FRED 2020-08-13 23:35:38 UTC 2020-08-13 23:49:06 UTC  date1     2020-08-13T23
JANE 2021-01-07 02:16:41 UTC null                     date2     null
AMY  2021-01-07 02:35:08 UTC 2021-01-07 23:26:20 UTC  date2     2021-01-07T23

Apparently hours are handled but not minutes or seconds. Also nulls apparently evaluate ahead of any other but should actually be discarded as they are no action taken fields.

Mikhail Berlyant · Accepted Answer

Below is for BigQuery Standard SQL

#standardSQL
select user, 
  array_agg(
    struct(split(col_value, ':')[offset(0)] as last_col, split(col_value, ':')[offset(1)] as last_date) 
    order by parse_date('%m/%d/%Y', split(col_value, ':')[offset(1)]) desc 
    limit 1
  )[offset(0)].* 
from `project.dataset.yourtable` t,
unnest(split(translate(to_json_string((select as struct t.* except(user))), '{}"', ''))) col_value
group by user

If to apply to sample data from your question

with `project.dataset.yourtable` as (
  select 'Fred' user, '01/01/1980' date1, '02/02/1980' date2, '03/15/1980' date3, '04/15/1980' date4, '05/15/1980' date5 union all
  select 'Jane', '02/01/1980', '03/01/1980', '02/15/1980', '02/01/1980', '02/01/1980'
)

output is

In case if your dateN columns are of timestamp data type - use below version

select user, 
  array_agg(
    struct(split(col_value, ':')[offset(0)] as last_col, regexp_extract(col_value, ':(.*)') as last_date) 
    order by regexp_extract(col_value, ':(.*)') desc 
    limit 1
  )[offset(0)].* 
from `project.dataset.yourtable` t,
unnest(split(translate(to_json_string((select as struct t.* except(user))), '{}"', ''))) col_value
group by user

Retrieving most recent date from series of dates in a single record / row

First Ask: Get most recent date from list

Second Request: Get most recent data element

Final Tuning to Account for Nulls and Timestamps (as opposed to dates)

DATE FORMATS

Answers (2)

Related Questions