vtrhmd
vtrhmd

Reputation: 85

BQ: Select latest date from multiple columns

Good day, all. I wrote a question relating to this earlier, but now I have encountered another problem.

I have to calculate the timestamp difference between the install_time and contributer_time columns. HOWEVER, I have three contributor_time columns, and I need to select the latest time from those columns first then subtract it from install time.

Sample Data

users install_time contributor_time_1 contributor_time_2 contributor_time_3
1 8:00 7:45 7:50 7:55
2 10:00 9:15 9:45 9:30
3 11:00 10:30 null null

For example, in the table above I would need to select contributor_time_3 and subtract it from install_time for user 1. For user 2, I would do the same, but with contributor_time_2.

Sample Results

users install_time time_diff_min
1 8:00 5
2 10:00 15
3 11:00 30

The problem I am facing is that 1) the contributor_time columns are in string format and 2) some of them have 'null' string values (which means that I cannot cast it into a timestamp.)

I created a query, but I am am facing an error stating that I cannot subtract a string from timestamp. So I added safe_cast, however the time_diff_min results are only showing when I have all three contributor_time columns as a timestamp. For example, in the sample table above, only the first two rows will pull.

The query I have so far is below:

SELECT
  users,
  install_time,
  TIMESTAMP_DIFF(install_time, greatest(contributor_time_1, contributor_time_2, contributor_time_3), MINUTE) as ctct_min 
FROM
  (SELECT 
  users,
  install_time,
  safe_cast(contributor_time_1 as timestamp) as contributor_time_1,
  safe_cast(contributor_time_2 as timestamp) as contributor_time_2,
  safe_cast(contributor_time_3 as timestamp) as contributor_time_3,
  FROM 
  (SELECT 
  users,
  install_time,
  case when contributor_time_1 = 'null' then '0' else contributor_time_1 end as contributor_time_1,
  ....
  FROM datasource
 

Any help to point me in the right direction is appreciated! Thank you in advance!

Upvotes: 0

Views: 1992

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173190

Consider below

select users, install_time,
  time_diff(
    parse_time('%H:%M',install_time), 
    greatest(
      parse_time('%H:%M',contributor_time_1), 
      parse_time('%H:%M',contributor_time_2),
      parse_time('%H:%M',contributor_time_3)
    ), 
    minute) as time_diff_min
from `project.dataset.table`    

if applied to sample data in your question - output is

enter image description here

Above can be refactored slightly into below

create temp function latest_time(arr any type) as ((
  select parse_time('%H:%M',val) time
  from unnest(arr) val
  order by time desc
  limit 1 
));
select users, install_time,
  time_diff(
    parse_time('%H:%M',install_time), 
    latest_time([contributor_time_1, contributor_time_2, contributor_time_3]), 
    minute) as time_diff_min
from `project.dataset.table`

less verbose and no redundant parsing - with same result - so just matter of preferences

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1271151

You can use greatest():

select t.*,
       time_diff(install_time, greatest(contributor_time_1, contributor_time_2, contributor_time_3), minute) as diff_min
from t;

Note: this assumes that the values are never NULL, which seems reasonable based on your sample data.

Upvotes: 1

Related Questions