sweeeeeet
sweeeeeet

Reputation: 1819

mystery on BigQuery views

Here is my mistery. On the console, when I compute this query, it works perfectly well:

SELECT rd.ds_id AS ds_id
FROM (SELECT ds_id, 1 AS dummy FROM bq_000010.table) rd
  INNER JOIN EACH (SELECT 1 AS dummy) cal ON (cal.dummy = rd.dummy);

Then I save it as a view called dataset.myview, and run:

SELECT * FROM dataset.myview LIMIT 1000

But this raises the following error:

SELECT query which references non constant fields or uses aggregation functions or has one or more of WHERE, OMIT IF, GROUP BY, ORDER BY clauses must have FROM clause.

Nevertheless, when I try: SELECT * FROM dataset.myview, i.e. without the LIMIT, it works !!

And in fact, when I run my full query with the LIMIT at the bottom, It also raises the error:

SELECT rd.ds_id AS ds_id
FROM (SELECT ds_id, 1 AS dummy FROM bq_000010.table) rd
  INNER JOIN EACH (SELECT 1 AS dummy) cal ON (cal.dummy = rd.dummy) LIMIT 1000;

Nevertheless, when I add an internal ORDER BY, it computes well again:

SELECT rd.ds_id AS ds_id
FROM (SELECT ds_id,
             1 AS dummy
      FROM bq_000010.000010_flux_visites_ds
      ORDER BY ds_id) rd
  INNER JOIN EACH (SELECT 1 AS dummy) cal ON (cal.dummy = rd.dummy) LIMIT 1000

Upvotes: 2

Views: 212

Answers (1)

xQbert
xQbert

Reputation: 35323

What happens if you apply an order by to your select on the view? or do you require random results?

A query with a LIMIT clause may still be non-deterministic if there is no operator in the query that guarantees the ordering of the output result set. This is because BigQuery executes using a large number of parallel workers. The order in which parallel jobs return is not guaranteed.

I'm not sure why the order by here would make a difference. However, It's generally odd to see a limit without any order by; which is why I asked about order. A complete SWAG is that perhaps the parallel workers are completing the outer join and limit before the inner select is complete causing an internal error; and by applying an order by the system is forced to materialize the record before executing the inner join join.

But I really have ~~NO CLUE~

Upvotes: 1

Related Questions