Reputation: 333
I have a problem with STRING_AGG in Bigquery. I'm trying:
SELECT
id,
institution,
COUNT(DISTINCT institution) OVER (PARTITION BY id) as count_intitution
STRING_AGG(DISTINCT institution,"," ) OVER (PARTITION BY id) as list_intitution
FROM
name_table
WHERE
DATE(created_at) = "2020-02-02"
and i get this error:
Analytic function string_agg does not support DISTINCT.
BQ documentation says it allows the use of "DISTINCT"
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#string_agg
But apparently it doesn't support "partition by", why?
EDIT:
the current table is like this (it is an example, the table has more attributes)
|id |institution|
|1 | a |
|1 | b |
|2 | a |
|2 | c |
|3 | a |
|1 | a |
and what I want to achieve is
|id|count_institution|list_institution|
|1 |2 |a,b |
|2 |2 |a,c |
|3 |1 |a |
Upvotes: 10
Views: 25039
Reputation: 5803
Updated based on your updated question. You could simply not use window functions
.
with cte1 as
(select distinct id, institution
from name_table
where date(created_at) = "2020-02-02")
select id, count(institution) count_inst, string_agg(institution,"," ) list_inst
from cte1
group by id;
Outputs
+----+------------+-----------+
| id | count_inst | list_inst |
+----+------------+-----------+
| 1 | 2 | a,b |
| 2 | 2 | a,c |
| 3 | 1 | a |
+----+------------+-----------+
Upvotes: 0
Reputation: 172944
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
REPLACE((
SELECT STRING_AGG(DISTINCT i) FROM t.list_intitution i
) AS list_intitution
)
FROM (
SELECT
id,
institution,
COUNT(DISTINCT institution) OVER (PARTITION BY id) AS count_intitution,
ARRAY_AGG(institution) OVER (PARTITION BY id) AS list_intitution
FROM
name_table
WHERE
DATE(created_at) = "2020-02-02"
) t
Note: in your original query you just remove DISTINCT and use ARRAY_AGG instead of STRING_AGG, but then in outer query you process this array to form list of distinct values from that array
Below is answer on your updated question
You can simply use GROUP BY as in below example
#standardSQL
SELECT id,
COUNT(DISTINCT institution) AS count_institution,
STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id
If to apply to sample data from your question, as in below example
#standardSQL
WITH name_table AS (
SELECT 1 id, 'a' institution UNION ALL
SELECT 1, 'b' UNION ALL
SELECT 2, 'a' UNION ALL
SELECT 2, 'c' UNION ALL
SELECT 3, 'a' UNION ALL
SELECT 1, 'a'
)
SELECT id,
COUNT(DISTINCT institution) AS count_institution,
STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id
result is
Row id count_institution list_institution
1 1 2 a,b
2 2 2 a,c
3 3 1 a
Upvotes: 7
Reputation: 1269445
You can easily work around this:
SELECT id, institution,
COUNT(DISTINCT institution) OVER (PARTITION BY id) as list_intitution
STRING_AGG(CASE WHEN seqnum = 1 THEN institution END, ',') OVER (PARTITION BY id) as list_intitution
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) as seqnum
FROM name_table
WHERE DATE(created_at) = '2020-02-02'
) t
Upvotes: 0