How can I set the number of partitions when using the Bigquery Connector in Apache Spark?

Question

I am reading the documentation both for Google Cloud Dataproc and generally for Apache Spark and am unable to figure out how to manually set the number of partitions when using the Bigquery connector.

The HDD is created using newAPIHadoopRDD and my strong suspicion is that this can be set via the config file which is passed to this function. But I can't actually figure out what the possible values are for the config file. Neither the Spark documentation or the Google documentation seems to specify or link to the Hadoop job configuration file specification.

Is there a way to set the partitions upon the creation of this RDD or do I just need to repartition it as the next step?

How can I set the number of partitions when using the Bigquery Connector in Apache Spark?

Answers (1)

Related Questions