Reputation: 909
The documentation states that the source_objects
argument takes templated values. However when I try the following:
gcs_to_bq_op = GoogleCloudStorageToBigQueryOperator(
task_id=name,
bucket='gdbm-public',
source_objects=['entity/{{ ds_nodash }}.0.{}.json'.format(filename)],
destination_project_dataset_table='dbm_public_entity.{}'.format(name),
schema_fields=schema,
source_format='NEWLINE_DELIMITED_JSON',
create_disposition='CREATE_IF_NEEDED',
write_disposition='WRITE_TRUNCATE',
max_bad_records=0,
allow_jagged_rows=True,
google_cloud_storage_conn_id='my_gcp_conn',
bigquery_conn_id='my_gcp_conn',
delegate_to=SERVICE_ACCOUNT,
dag=dag
)
I receive the error message:
Exception: BigQuery job failed. Final error was: {u'reason': u'notFound', u'message': u'Not found: URI gs://gdbm-public/entity/{ ds_nodash }.0.GeoLocation.json'}.
I found an example where the {{ ds_nodash }}
variable is used in the same way. So I'm not sure why this doesn't work for me.
Upvotes: 2
Views: 1264
Reputation: 716
The issue is exactly like Dustin has described, calling .format
on the string is causing one set of the double braces to be removed. However, instead of doubling the bracket which is 1 solution:
'entity/{{{{ ds_nodash }}}}.0.{}.json'.format(filename)
I find it is easier to format the string this way to avoid confusion:
"entity/{0}.0.{1}.json".format("{{ ds_nodash }}", filename)
Upvotes: 3
Reputation: 21550
The issue is that calling .format
on the string is causing one set of the double braces to be removed:
>>> 'entity/{{ ds_nodash }}.0.{}.json'.format(filename)
'entity/{ ds_nodash }.0.foobar.json'
You need to escape the braces that you want to be in the string by doubling them:
>>> 'entity/{{{{ ds_nodash }}}}.0.{}.json'.format(filename)
'entity/{{ ds_nodash }}.0.foobar.json'
Upvotes: 4