Reputation: 325
I've written a Dataflow job that works great when I run it manually. Here is the relevant section (with some validation code removed for clarity):
parser.add_argument('--end_datetime',
dest='end_datetime')
known_args, pipeline_args = parser.parse_known_args(argv)
query = <redacted SQL String with a placeholder for a date>
query = query.replace('#ENDDATETIME#', known_args.end_datetime)
with beam.Pipeline(options=pipeline_options) as p:
rows = p | 'read query' >> beam.io.Read(beam.io.BigQuerySource(query=query, use_standard_sql=True))
Now I want to create a template and schedule it to run on a regular basis with a dynamic ENDDATETIME. As I understand it, in order to do this I need to change add_argument to add_value_provider_argument per this documentation:
https://cloud.google.com/dataflow/docs/templates/creating-templates
Unfortunately, it appears that ValueProvider values are not available when I need them, they're only available inside the pipeline itself. (please correct me if I'm wrong here...). So I'm kind of stuck.
Does anyone have any pointers on how I could get a dynamic date into my query in a Dataflow template?
Upvotes: 5
Views: 1797
Reputation: 251
Python currently only supports ValueProvider options for FileBasedSource IOs. You can see that by clicking on the Python tab at the link you used: https://cloud.google.com/dataflow/docs/templates/creating-templates
under the "Pipeline I/O and runtime parameters" section.
Upvotes: 5