Using BigQuery Storage API through Spark: Requested multiple partitions but getting only 1

Question

I'm using bigquery-spark-connector to read from BigQuer, which uses BigQuery Storage API. My script (automatically) requests multiple partitions from BigQuery Storage API but I get the warning:

WARN com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation: Requested 2 partitions, but only received 1 from the BigQuery Storage API

The Spark job takes very long and I think it's because it's not reading through multiple partitions. How can I make sure that BigQuery Storage API gives me all the partitions I'm asking for? What's happening here, why is it only giving me a single partition, no matter how many I request?

First I create a SparkSession:

SparkSession spark = SparkSession.builder()
.appName("XXX")
.getOrCreate();

This is the code causing the WARN:

Dataset data = spark.read()
.format("bigquery")
.option("table","project.dataset.table")
.load()
.cache();

Using BigQuery Storage API through Spark: Requested multiple partitions but getting only 1

Answers (1)

Related Questions