Reputation: 53
Let's say that I have the following arrays:
SELECT ['A', 'B', 'C', 'A', 'A', 'A'] AS origin_array
UNION ALL
SELECT ['A', 'A', 'B'] AS secondary_array
And I want to remove all duplicate values between the arrays (as opposed to within the arrays), so that the final result will be:
SELECT ['C', 'A', 'A'] AS result_array
Any idea how can it be done?
Upvotes: 0
Views: 1967
Reputation: 172944
Below is for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION DEDUP_ARRAYS(arr1 ANY TYPE, arr2 ANY TYPE) AS ((ARRAY(
SELECT item FROM (
SELECT item, ROW_NUMBER() OVER(PARTITION BY item) pos FROM UNNEST(arr1) item UNION ALL
SELECT item, ROW_NUMBER() OVER(PARTITION BY item) pos FROM UNNEST(arr2) item
)
GROUP BY item, pos
HAVING COUNT(1) = 1
)));
WITH `project.dataset.table` AS (
SELECT ['A', 'B', 'C', 'A', 'A', 'A'] AS origin_array, ['A', 'A', 'B'] AS secondary_array
)
SELECT DEDUP_ARRAYS(origin_array, secondary_array) AS result_array
FROM `project.dataset.table`
with result
Row result_array
1 A
A
C
which is what would SELECT ['C', 'A', 'A'] AS result_array
returned
Upvotes: 3
Reputation: 19
If you type just UNION
instead of UNION ALL
it should not take the duplicate values.
Upvotes: 0