Lior
Lior

Reputation: 1457

BigQuery - duplicate rows in query with scoped aggregation

I'm trying to run a query that uses group_concat function with a scoped aggregation. The following query returns 175 rows, all with the same values. The duplication seems to be a result of having 175 elements in cell.value

SELECT
  ngram,
  group_concat(cell.sample.id) within record con
FROM [publicdata:samples.trigrams] 
where ngram = '! ! That'

When adding a new column to query above (count with scoped aggregation), the result is one row, as expected. The count row shows the value 175.

SELECT
  ngram,
  count(cell.value) within record cnt,
  group_concat(cell.sample.id) within record con
FROM [publicdata:samples.trigrams] 
where ngram = '! ! That'

Is seems that the row duplication occurs because there are no values to group (all nulls). If I change the group_concat to:

group_concat(if(cell.sample.id is null,'',cell.sample.id)) within record con

then once again there is only one row.

What is the reason for this?

How can this be avoided without resorting to group by on all columns (which will also require a subquery since it's impossible to combine group by and scoped aggregation functions)?

Upvotes: 2

Views: 487

Answers (1)

Jordan Tigani
Jordan Tigani

Reputation: 26617

This is a bug in the query engine.... it should only return 1 row. We're tracking it internally, and hopefully should have a fix soon.

Upvotes: 1

Related Questions