Reputation: 988
I am try to use BigQuery to JOIN
over a range of tables using Wildcard Tables
.
The query works when all tables matched by the wildcard have the column fooid
(bar*.fooid
). However this column is a recent addition and when the table wildcard matches tables where the field does not exist the query fails.
Error: Cannot read non-required field 'fooid' as required INT64.
This is a simplified version of the query to demonstrate the issue, it would be selecting more columns from both foo and bar.
SELECT foo.foo_id AS foo
FROM `bar.bar*` AS bar_alias
LEFT JOIN bar.foo_map foo ON (bar_alias.fooid = foo.foo_id)
WHERE (_TABLE_SUFFIX BETWEEN '20170206' AND '20170208')
I've looked at a number of answers including BigQuery IF field exists THEN, but can't see how to use them in conjunction with a JOIN
or when the tables without the column are not known.
Upvotes: 0
Views: 2485
Reputation: 33705
Here is an example of how this can arise, and how to fix it by using a reference schema from an empty table where the column/field is NULLABLE
. Suppose I have the following two tables:
$ bq query --use_legacy_sql=false \
"CREATE TABLE tmp_elliottb.bar20180328 (y STRING) AS SELECT 'bar';"
$ bq query --use_legacy_sql=false \
"CREATE TABLE tmp_elliottb.bar20180329 " \
"(x INT64 NOT NULL, y STRING) AS SELECT 1, 'foo';"
Column x
has the NOT NULL
attribute in the second table, but the column is missing from the first table. I get an error when I try to use a table wildcard:
$ bq query --use_legacy_sql=false \
"SELECT * FROM \`tmp_elliottb.bar*\` " \
"WHERE _TABLE_SUFFIX BETWEEN '20180301' AND '20180329';"
Waiting on <job id> ... (0s) Current status: DONE
Error in query string: Error processing job '<job id>': Cannot read non-required field 'x' as required INT64.
Failure details:
- query: Cannot read non-required field 'x' as required INT64.
This makes sense--I said that x
is NOT NULL
, but the bar20180328
table doesn't have the column. Now if I create a new table that matches the *
expansion, but where the column doesn't have NOT NULL
:
$ bq query --use_legacy_sql=false \
"CREATE TABLE tmp_elliottb.bar_empty (x INT64, y STRING);"
$ bq query --use_legacy_sql=false \
"SELECT * FROM \`tmp_elliottb.bar*\` " \
"WHERE _TABLE_SUFFIX BETWEEN '20180301' AND '20180329';"
...
+------+-----+
| x | y |
+------+-----+
| 1 | foo |
| NULL | bar |
+------+-----+
I get results instead of an error. In your case, you need to create a table with the expected schema called bar_empty
, for example, but where none of the fields/columns that are missing for other tables have a NOT NULL
attribute.
With that said, I would strongly recommend using a partitioned table instead, if possible. Among other benefits, partitioned tables are much nicer to work with because they have a consistent schema across all days.
Upvotes: 2