Reputation: 348
I need to run a dynamic query to BigQuery in my Apache Beam pipeline. The query should be evaluated at runtime based on a value in the message. i.e select * from mytable where mycolumn = << dynamic value >>
I can't seem to get the Apache Beam io connectors to work with the dynamic query. Ideally the pipeline would look something like this:
from apache_beam import Create, Pipeline
from apache_beam.io.gcp.bigquery import ReadFromBigQuery
...
with Pipeline(argv=pipeline_args) as p:
p | Create([{"foo": "bar"}]
| "BQ" >> ReadFromBigQuery(
query=f"select * from mytable where mycolumn = {message['foo']}"
)
Here are the docs of [apache_beam.io.gcp.bigquery.ReadFromBigquery][1]
.
Is there a way I can accomplish this?
Upvotes: 0
Views: 1498
Reputation: 2670
You can use ReadAllFromBigQuery
.
Example:
sql = "SELECT unique_key FROM `bigquery-public-data.austin_311.311_service_requests` WHERE street_number='{number}'"
def to_bq_request(element, query):
from apache_beam.io import ReadFromBigQueryRequest
return ReadFromBigQueryRequest(query=query.format(number=element))
(p | Create([14301, 1580])
| Map(to_bq_request, query=sql)
| ReadAllFromBigQuery()
| Map(logging.info)
)
Upvotes: 3