user9984628
user9984628

Reputation: 53

Remove duplicate values between arrays in BigQuery

Let's say that I have the following arrays:

SELECT ['A', 'B', 'C', 'A', 'A', 'A'] AS origin_array
UNION ALL
SELECT ['A', 'A', 'B'] AS secondary_array

And I want to remove all duplicate values between the arrays (as opposed to within the arrays), so that the final result will be:

SELECT ['C', 'A', 'A'] AS result_array

Any idea how can it be done?

Upvotes: 0

Views: 1967

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below is for BigQuery Standard SQL

#standardSQL
CREATE TEMP FUNCTION DEDUP_ARRAYS(arr1 ANY TYPE, arr2 ANY TYPE) AS ((ARRAY(
  SELECT item FROM (
    SELECT item, ROW_NUMBER() OVER(PARTITION BY item) pos FROM UNNEST(arr1) item UNION ALL
    SELECT item, ROW_NUMBER() OVER(PARTITION BY item) pos FROM UNNEST(arr2) item
  )
  GROUP BY item, pos
  HAVING COUNT(1) = 1
)));
WITH `project.dataset.table` AS (
  SELECT ['A', 'B', 'C', 'A', 'A', 'A'] AS origin_array, ['A', 'A', 'B'] AS secondary_array
)
SELECT DEDUP_ARRAYS(origin_array, secondary_array) AS result_array
FROM `project.dataset.table`  

with result

Row result_array     
1   A    
    A    
    C      

which is what would SELECT ['C', 'A', 'A'] AS result_array returned

Upvotes: 3

Sergiocon
Sergiocon

Reputation: 19

If you type just UNION instead of UNION ALL it should not take the duplicate values.

Upvotes: 0

Related Questions