Reputation: 872
I have a table like this:
CUSTOMERS_ID DATE_SALES DIMENSION
MARIO1 20200201 NULL
MARIO1 20200113 Spain
MARIO2 20200131 NULL
MARIO3 20200101 France
MARIO3 20191231 Spain
and I need to order by CUSTOMERS_ID and DATE_SALES DESC fields. Then I want to group by CUSTOMERS_ID field and get first not null value of DIMENSION field. The output table would be:
CUSTOMERS_ID DIMENSION
MARIO1 Spain
MARIO2 NULL
MARIO3 France
Any ideas? I have tried COALESCE
functions, FIRST_VALUE
, and I have not the results I had expected.
Thanks in advance!
Upvotes: 7
Views: 4180
Reputation: 2365
You can group by customer id and use ARRAY_AGG by ignoring NULLS, and you can also order by date in that field. Limit 1 will make it more efficient by using less RAM storage. And then, OFFSET(0) will make it an unnested field, so you can use that field easily.
WITH
raw_data AS
(
SELECT 'MARIO1' CUSTOMERS_ID, 20200201 DATE_SALES, NULL as DIMENSION UNION ALL
SELECT 'MARIO1' CUSTOMERS_ID, 20200113 DATE_SALES, 'Spain' as DIMENSION UNION ALL
SELECT 'MARIO2' CUSTOMERS_ID, 20200131 DATE_SALES, NULL as DIMENSION UNION ALL
SELECT 'MARIO3' CUSTOMERS_ID, 20200101 DATE_SALES, 'France' as DIMENSION UNION ALL
SELECT 'MARIO3' CUSTOMERS_ID, 20191231 DATE_SALES, 'Spain' as DIMENSION
)
SELECT CUSTOMERS_ID, ARRAY_AGG(DIMENSION IGNORE NULLS ORDER BY DATE_SALES DESC LIMIT 1)[OFFSET(0)] as DIMENSION
FROM raw_data
GROUP BY 1
Upvotes: 10
Reputation: 173190
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY IF(DIMENSION IS NULL, NULL, DATE_SALES) DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY CUSTOMERS_ID
if to apply to sample data from your question - result is
Row CUSTOMERS_ID DATE_SALES DIMENSION
1 MARIO1 20200113 Spain
2 MARIO2 20200131 null
3 MARIO3 20200101 France
Upvotes: 3
Reputation: 522732
We can just use a ROW_NUMBER
trick here:
WITH cte AS (
SELECT CUSTOMERS_ID,
ROW_NUMBER() OVER (PARTITION BY CUSTOMERS_ID
ORDER BY -1.0*UNIX_SECONDS(DATE_SALES) DESC) rn
FROM yourTable
)
SELECT CUSTOMERS_ID, DIMENSION
FROM cte
WHERE rn = 1
ORDER BY CUSTOMERS_ID;
The logic is to order row number descending by the negative number of seconds since the epoch. This places more recent sales first, and also places NULL
at the very end, so a NULL
value would only receive row number 1 in the event that no non NULL
dimension data be available.
Upvotes: 1