Reputation: 1537
I am trying to read a table form BigQuery using PySpark.
I have tried the following
table = 'my-project-id.project-dataset.test_table_spark'
df = spark.read.format('bigquery').option('table', table).load()
However, I am getting this error
: java.lang.ClassNotFoundException: Failed to find data source: bigquery. Please find packages at http://spark.apache.org/third-party-projects.html
How can I read the bigQuery table from pySpark (at the moment I'm using python2)
Upvotes: 7
Views: 13879
Reputation: 251
You need to include the jar for the spark-bigquery-connector with your spark-submit
. The easiest way to do that would be using the --jars
flag to include the publicly available and most up-to-date version of the connector:
spark-submit --jars gs://spark-lib/bigquery/spark-bigquery-latest.jar my_job.py
Though the examples reference Cloud Dataproc, this should work when submitting to any Spark cluster.
Upvotes: 8