franco pina
franco pina

Reputation: 333

STRING_AGG in Bigquery

I have a problem with STRING_AGG in Bigquery. I'm trying:

SELECT
 id,
 institution,
 COUNT(DISTINCT institution)  OVER (PARTITION BY id) as count_intitution
 STRING_AGG(DISTINCT institution,"," )  OVER (PARTITION BY id) as list_intitution
FROM
 name_table
WHERE
 DATE(created_at) = "2020-02-02"

and i get this error:

Analytic function string_agg does not support DISTINCT.

BQ documentation says it allows the use of "DISTINCT"

https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#string_agg

But apparently it doesn't support "partition by", why?

EDIT:

the current table is like this (it is an example, the table has more attributes)

|id |institution|
|1  | a         |
|1  | b         |
|2  | a         |
|2  | c         |
|3  | a         |
|1  | a         |

and what I want to achieve is

|id|count_institution|list_institution|
|1 |2                |a,b             |
|2 |2                |a,c             |
|3 |1                |a               |

Upvotes: 10

Views: 25039

Answers (3)

Rajat
Rajat

Reputation: 5803

Updated based on your updated question. You could simply not use window functions.

with cte1 as
(select distinct id, institution
from name_table
where date(created_at) = "2020-02-02")

select id, count(institution) count_inst, string_agg(institution,"," ) list_inst
from cte1 
group by id;

Outputs

+----+------------+-----------+
| id | count_inst | list_inst |
+----+------------+-----------+
|  1 |          2 | a,b       |
|  2 |          2 | a,c       |
|  3 |          1 | a         |
+----+------------+-----------+

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below is for BigQuery Standard SQL

#standardSQL
SELECT * 
  REPLACE((
      SELECT STRING_AGG(DISTINCT i) FROM t.list_intitution i
    ) AS list_intitution
  ) 
FROM (
  SELECT
   id,
   institution,
   COUNT(DISTINCT institution)  OVER (PARTITION BY id) AS count_intitution,
   ARRAY_AGG(institution) OVER (PARTITION BY id) AS list_intitution
  FROM
   name_table
  WHERE
   DATE(created_at) = "2020-02-02"
) t  

Note: in your original query you just remove DISTINCT and use ARRAY_AGG instead of STRING_AGG, but then in outer query you process this array to form list of distinct values from that array

Below is answer on your updated question

You can simply use GROUP BY as in below example

#standardSQL
SELECT id, 
  COUNT(DISTINCT institution) AS count_institution,
  STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id

If to apply to sample data from your question, as in below example

#standardSQL
WITH name_table AS (
  SELECT 1 id, 'a' institution UNION ALL
  SELECT 1, 'b' UNION ALL
  SELECT 2, 'a' UNION ALL
  SELECT 2, 'c' UNION ALL
  SELECT 3, 'a' UNION ALL
  SELECT 1, 'a' 
)
SELECT id, 
  COUNT(DISTINCT institution) AS count_institution,
  STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id

result is

Row id  count_institution   list_institution     
1   1   2                   a,b  
2   2   2                   a,c  
3   3   1                   a    

Upvotes: 7

Gordon Linoff
Gordon Linoff

Reputation: 1269445

You can easily work around this:

SELECT id, institution,
       COUNT(DISTINCT institution)  OVER (PARTITION BY id) as list_intitution
       STRING_AGG(CASE WHEN seqnum = 1 THEN institution END, ',')  OVER (PARTITION BY id) as list_intitution
FROM (SELECT t.*, 
             ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) as seqnum
      FROM name_table
      WHERE DATE(created_at) = '2020-02-02'
     ) t

Upvotes: 0

Related Questions