Graham Place
Graham Place

Reputation: 308

Correctly Migrate Postgres least() Behavior to BigQuery

I'm attempting to migrate postgres scripts over to bigquery, with the end goal of both scripts returning the exact same tables (schema and values).

I'm running into an issue when trying to replicate the behavior of least() in postgres in my bigquery selects.

In postgres, if any parameters of the least() call are null, they are skipped and the least non-null value is returned. In bigquery, however, if any of the parameters of the least() call are null, the function automatically returns null.

I'm looking for an elegant solution to replicate the postgres least() behavior in bigquery. My current—clunky—solution is below:

Postgres (returns -1):

SELECT LEAST(1, 0, -1, null)

BigQuery (returns null):

SELECT LEAST(1, 0, -1, null)

Postgres (returns -1):

SELECT LEAST(COALESCE(1, 0, -1, null),
             COALESCE(0, 1, -1, null),
             COALESCE(-1, 0, 1, null),
             COALESCE(null, 0, -1, 1))

BigQuery (returns -1):

SELECT LEAST(COALESCE(1, 0, -1, null),
             COALESCE(0, 1, -1, null),
             COALESCE(-1, 0, 1, null),
             COALESCE(null, 0, -1, 1))

This works but is a less-than-ideal solution.

In the original postgres script I need to migrate, there is nested logic like least(w, x, least(y, z)) so that fix gets exponentially more unreadable as the number of values/complexity grows. That same issue applies when you try to do this as a massive CASE block.

If anyone has an obvious fix that I'm missing or a more elegant way to mirror the postgres behavior in bigquery, it is much appreciated!

Upvotes: 10

Views: 4449

Answers (5)

Timogavk
Timogavk

Reputation: 889

How about this? :) "The Postgres library" :)

    DECLARE input STRING DEFAULT (
    WITH t AS 
    (
        SELECT 1 a, 0 b, -1 c, null d
        UNION ALL 
        SELECT  0, 1, -1, null
        UNION ALL 
        SELECT -1, 0, 1, null
        UNION ALL 
        SELECT null, 0, -1, 1
    )
    SELECT  '['||STRING_AGG("'"||TO_JSON_STRING(t)||"'")||']' FROM t    
)
;
EXECUTE IMMEDIATE '''  
    SELECT * FROM EXTERNAL_QUERY("project.location.connection", 
    \'\'\'
    SELECT 
    GREATEST (
        (t::json->>'a')::INT,
        (t::json->>'b')::INT,
        (t::json->>'c')::INT,
        (t::json->>'d')::INT
    ) 
    FROM
    UNNEST (ARRAY '''||input||")  AS t ''')"

Of course it may improve with dynamic "body" for GREATEST but I hope nobody will use it.

It's too sad to live in the world with no IGNORE NULLS for GREATEST and LEAST...as well as impossible to direct string variable into EXTERNAL_QUERY 😔 😔 😔

Questions to community

  1. Does anyone know about limitations for this "method"?
  2. How long string is allowed to execute in EXECUTE IMMEDIATE?
  3. as argument for EXTERNAL_QUERY?

Upvotes: 0

Letsios Matthaios
Letsios Matthaios

Reputation: 306

Maybe something like this could work?

WITH tbl AS(
    SELECT 1 AS a, 2  AS b
    UNION ALL SELECT NULL, 2
    UNION ALL SELECT 1, NULL
    UNION ALL SELECT NULL, NULL
)
SELECT 
    tbl.*
    , COALESCE( LEAST(a, b), a , b) 
FROM tbl

Upvotes: 1

Scott Schaen
Scott Schaen

Reputation: 111

Without a function:

select 
  (select min(col) from unnest([a,b,c,d,e]) col) least,
  (select max(col) from unnest([a,b,c,d,e]) col) greatest,
  *
from
(
  select 1 a, 2 b, 3 c, null d, 5 e
  union all
  select null a, null b, null c, null d, null e
) tbl

Upvotes: 8

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173056

There is a simple workaround for BigQuery Standard SQL

You just create your own function (let's say myLeast)
It works for "standalone" as well as in nested scenario

#standardSQL
CREATE TEMP FUNCTION myLeast(x ARRAY<INT64>) AS
((SELECT MIN(y) FROM UNNEST(x) AS y));
SELECT 
  LEAST(1, 0, -1, NULL) AS least_standard,  
  LEAST(COALESCE(1, 0, -1, NULL),
    COALESCE(0, 1, -1, NULL),
    COALESCE(-1, 0, 1, NULL),
    COALESCE(NULL, 0, -1, 1)) AS least_less_than_ideal,
  myLeast([1, 0, -1, NULL]) AS least_workaround,
  myLeast([1, 0, -1, NULL, myLeast([2, 0, -2, NULL])]) AS least_with_nested  

Output is

least_standard  least_less_than_ideal   least_workaround    least_with_nested    
null            -1                      -1                  -2   

first two is from your question - third and forth are "standalone" and nested workaround

Hope you can apply this approach to your specific case

Upvotes: 8

Mosha Pasumansky
Mosha Pasumansky

Reputation: 14014

Both Oracle and Vertica behave the same as BigQuery, following general rule of SQL functions - if one of the arguments is NULL - the result is NULL. PostgreSQL makes an exception to that rule, explicitly stating in documentation:

The result will be NULL only if all the expressions evaluate to NULL.

Note that GREATEST and LEAST are not in the SQL standard, but are a common extension. Some other databases make them return NULL if any argument is NULL, rather than only when all are NULL.

I would open Feature Request in BigQuery issue tracker to add IGNORE NULLS parameter to LEAST and GREATEST to get PostgreSQL compatible behavior. Even though normally IGNORE NULLS only applies to aggregate functions, LEAST and GREATEST are kind of similar to aggregate functions.

Upvotes: 6

Related Questions