How to run BigQueryIO.read().fromQuery with parameters

Question

I need to run multiple queries from a single .SQL file but with different params I've tried something like this but it does not work as BigQueryIO.Read consumes only PBegin.

  public PCollection> expand(PCollection input) {

    PCollection> section1 = input.apply("Read Section1 from BQ",
                BigQueryIO
                        .readTableRows()
                        .fromQuery(ResourceRetriever.getResourceFile("query/test/section1.sql"))
                        .usingStandardSql()
                        .withoutValidation())
            .apply("Convert section1 to Dto", ParDo.of(new TableRowToSection1DtoFunction()));
  }

Are there any other ways to put params from existing PCollection inside my BigQueryIO.read() invocation?

Dmytro Pavlov · Accepted Answer

I've come up with the following solution: not to use BigQueryIO but regular GCP library for accessing BigQuery, marking it as transient and initializing it each time in method with @Setup annotation, as it is not Serializable

public class DenormalizedCase1Fn extends DoFn<*> {

  private transient BigQuery bigQuery;

  @Setup
  public void initialize() {
    this.bigQuery = BigQueryOptions.newBuilder()
            .setProjectId(bqProjectId.get())
            .setLocation(LOCATION)

            .setRetrySettings(RetrySettings.newBuilder()
                    .setRpcTimeoutMultiplier(1.5)
                    .setInitialRpcTimeout(Duration.ofSeconds(5))
                    .setMaxRpcTimeout(Duration.ofSeconds(30))
                    .setMaxAttempts(3).build())
            .build().getService();
  }

  @ProcessElement 
  ...

How to run BigQueryIO.read().fromQuery with parameters

Answers (2)

Related Questions