vdolez
vdolez

Reputation: 1118

Exception when reading BigQuery from Dataflow template using ValueProvider

I'm trying to create a template to read from BigQuery, unfortunately I get an exception trying to build the template.

An exception occured while executing the Java class. Cannot call validate if table is dynamically set.

Reading the documentation, it seems that there's a special function to call when reading BigQuery from batch template :

Note: If you want to run a batch pipeline that reads from BigQuery, you must use .withTemplateCompatibility() on all BigQuery reads.

So, here's my code snippet :

PCollection<Discount> discountFromBigQuery = p.apply("Parse Discounts from BigQuery", BigQueryIO.read((SerializableFunction<SchemaAndRecord, Discount>) record -> {
        GenericRecord row = record.getRecord();
        return new Discount(row);
    }).withTemplateCompatibility().from(options.getBigQueryDiscountPath()).withCoder(SerializableCoder.of(Discount.class)));

Obviously, options.getBigQueryDiscountPath() is a ValueProvider<String>

So, how can I get rid of this error and template the BigQuery reading part ?

Here are the maven dependencies I use :

<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-core</artifactId>
    <version>2.8.0</version>
</dependency>
<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
    <version>2.8.0</version>
</dependency>
<dependency>
    <groupId>com.google.cloud.dataflow</groupId>
    <artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
    <version>2.5.0</version>
</dependency>

Upvotes: 3

Views: 2150

Answers (2)

Booo
Booo

Reputation: 89

By the way, withoutValidation() needs to be added at the end of the chain like below.

    // queryString is of type ValueProvider<String>
    PCollection<TableRow> rowsFromBigQuery = pipeline.apply(
                BigQueryIO.readTableRows()
                        .fromQuery(queryString)
                        .usingStandardSql()
                        .withMethod(options.getReadMethod())
                        .withoutValidation());

Upvotes: 1

ch_mike
ch_mike

Reputation: 1576

I believe the error you are facing is defined here. Please note the explanation, that mentions

Note that a table or query check can fail if the table or dataset are created by earlier stages of the pipeline or if a query depends on earlier stages of a pipeline.

To overcome this, try adding the withoutValidation method in your BigQueryIO.read call.

Upvotes: 3

Related Questions