CODEWITHSUNDEEP

sqlmergegoogle-bigqueryduplicates

Yannic Jänike

Reputation: 115

Google BigQuery [Standard SQL] - merge rows only on selected columns, array agg the rest

Standard SQL

Hey everyone,

I have a big query table that approximately look like this

team_id, channel_id, value_1, value_2,...
A        1111        xxx      yyy
B        1111        xxx      yyy
C        2222        uuu      vvv
D        2222        uuu      vvv
E        3333        grr      sss

So as you can see, currently, everything is indexed by team_id while all the data that has the same channel_id is same. I need the data to be indexed by channel_id. The result should look like this:

team_id, channel_id, value_1, value_2,...
[A,B]    1111        xxx      yyy
[C,D]    2222        uuu      vvv
[E]      3333        grr      sss

Such that rows with same channel ID are reduced to one row, but the team_id is merged into an array. Is this possible? I am still quite new to SQL/BigQuery and don't know how to do this. I would be happy for some help!

Big thank in advance!

Upvotes: 1

Views: 285

Answers (2)

Mikhail Berlyant

Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

... all the data that has the same channel_id is same ...

So, below eliminates needs in explicitly listing all those columns so code the same for any number of those columns and their names

#standardSQL
SELECT ARRAY_AGG(team_id) AS team_id, ANY_VALUE(str).*
FROM (
  SELECT team_id,
    (SELECT AS STRUCT * EXCEPT(team_id) FROM UNNEST([t])) AS str
  FROM `project.dataset.table` t
)
GROUP BY TO_JSON_STRING(str)

Depends on your coding preferences - you can use refactored version of above

#standardSQL
SELECT ARRAY_AGG(team_id) AS team_id, 
  ANY_VALUE((SELECT AS STRUCT * EXCEPT(team_id) FROM UNNEST([t]))).*
FROM `project.dataset.table` t
GROUP BY TO_JSON_STRING((SELECT AS STRUCT * EXCEPT(team_id) FROM UNNEST([t])))

In both cases, if to apply to sample data from your question - the output as below

Row team_id channel_id  value_1 value_2  
1   A       1111        xxx     yyy  
    B                
2   C       2222        uuu     vvv  
    D                
3   E       3333        grr     sss

Upvotes: 1

Reputation: 1269813

I thin you just want aggregation:

select array_agg(team_id) as team_ids, channel_id, value_1, value_2
from t
group by channel_id, value_1, value_2;

Upvotes: 1

Related Questions