random483423948295
random483423948295

Reputation: 15

Distinct from array in Big Query with a twist (only if adjacent)

I've been through the array documentation in Big Query and found that I can use UNNEST and DISTINCT to remove duplicates in an array content, but I want to remove the duplicates only if they are adjacent in the array (as it's an ordered list).

For example, for this input:

[a, a, b, a, a, c, b, b]

Expected output would be:

[a, b, a, c, b]

Any ideas appreciated.

Upvotes: 0

Views: 275

Answers (2)

Jaytiger
Jaytiger

Reputation: 12254

You may consider another approach using set operation.

WITH sample_data AS (
  SELECT ['a', 'a', 'b', 'a', 'a', 'c', 'b', 'b'] arr
)
SELECT *,
       ARRAY(
         SELECT e FROM (
            SELECT e, o FROM t.arr e WITH offset o
            EXCEPT DISTINCT
            SELECT e, o + 1 FROM t.arr e WITH offset o
         ) ORDER BY o
       ) AS distinct_arr
  FROM sample_data t;

Query results

enter image description here

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173046

Use below approach

select *, array( 
  select any_value(el)
    from (
      select as struct *, countif(flag) over(order by offset) grp
      from (
        select offset, el, ifnull(el != lag(el) over(order by offset), true) flag
        from t.arr as el with offset
      )
    )
    group by grp
    order by min(offset)
  )
from your_table t           

if applied to sample data in your question - output is

enter image description here

Upvotes: 0

Related Questions