Reputation: 85
When does BigQuery flatten an intermediate result set? I was under the impression that it was only when FLATTEN
was invoked, but I've encountered an example where the result is flattened without a FLATTEN
.
This is the case - this base query returns one record:
select count(*) from publicdata:samples.trigrams
where ngram = 'der Griindung im'
+-----+
| f0_ |
+-----+
| 1 |
+-----+
When queried, you can see that the record has a repeated field that is repeated twice.
select * from publicdata:samples.trigrams
where ngram = 'der Griindung im'
+------------------+-------+-----------+-------+--------+-------+------------+-------------------+----------------------+-----------------+------------------+----------------+------------------+-------------------+----------------------+---------------------+-----------------+
| ngram | first | second | third | fourth | fifth | cell_value | cell_volume_count | cell_volume_fraction | cell_page_count | cell_match_count | cell_sample_id | cell_sample_text | cell_sample_title | cell_sample_subtitle | cell_sample_authors | cell_sample_url |
+------------------+-------+-----------+-------+--------+-------+------------+-------------------+----------------------+-----------------+------------------+----------------+------------------+-------------------+----------------------+---------------------+-----------------+
| der Griindung im | der | Griindung | im | NULL | NULL | 2007 | 54 | 0.008746355685131196 | 54 | 54 | NULL | NULL | NULL | NULL | NULL | NULL |
| der Griindung im | der | Griindung | im | NULL | NULL | 2008 | 47 | 0.007612568837058633 | 47 | 47 | NULL | NULL | NULL | NULL | NULL | NULL |
+------------------+-------+-----------+-------+--------+-------+------------+-------------------+----------------------+-----------------+------------------+----------------+------------------+-------------------+----------------------+---------------------+-----------------+
When I add a filter on cell.value
, I get two records instead of one - but I never flattened so I'm not sure about the behavior here. My expectation is that this would return the same output as the previous COUNT
above. It doesn't:
select count(*) from publicdata:samples.trigrams
where ngram = 'der Griindung im' and cell.value in ('2007', '2008')
+-----+
| f0_ |
+-----+
| 2 |
+-----+
What this means is that while I expect select * from publicdata:samples.trigrams where ngram = 'der Griindung im'
and select * from publicdata:samples.trigrams where ngram = 'der Griindung im' and cell.value in ('2007', '2008')
to return the same output, they don't because one is implicitly flattened and the other is not. While this may not seem like a huge issue, this could matter significantly if it was part of a nested query that expected an intermediate result to be flattened or repeated.
Under what conditions does BigQuery flatten results without an explicit FLATTEN
?
Upvotes: 1
Views: 270
Reputation: 1244
Short story: use count(0)
instead of count(*)
. (You get 1
instead of 2
.)
count(*)
behaves strangely with repeated fields. It looks like the results are flattened, but if that were really the case, this should also affect count(0)
. I've asked about this here, but I haven't so far received a full explanation.
Upvotes: 1
Reputation: 173190
Let me answer first, how to get correct count in this case:
So instead of
SELECT COUNT(*)
FROM [publicdata:samples.trigrams]
WHERE ngram = 'der Griindung im'
AND cell.value IN ('2007', '2008')
with result of
+-----+
| f0_ |
+-----+
| 2 |
+-----+
you should do
SELECT COUNT(*)
FROM [publicdata:samples.trigrams]
WHERE ngram = 'der Griindung im'
OMIT RECORD IF EVERY(cell.value NOT IN ('2007', '2008'))
with result of
+-----+
| f0_ |
+-----+
| 1 |
+-----+
as I think what you expected
Secondly - Under what conditions does BigQuery flatten results without an explicit FLATTEN?
I think (just my guess baseed on BQ behavior observation) every time you explicitelly reference record's field within clauses like SELECT or WHERE , it gets automatically flattened for you. Using FLATTEN operator helps "control" this process.
Upvotes: 1