Pritam
Pritam

Reputation: 23

How to create external table in BigQuery using Dataproc Pyspark

My use case includes creating an external table in Bigquery using Pyspark code. The data source is Google cloud storage bucket where JSON data is sitting. I am reading the JSON data into a data frame and want to create an external Bigquery table. As of now, the table is getting created but it is not an external one.

df_view.write\
    .format("com.google.cloud.spark.bigquery")\
    .option('table', 'xyz-abc-abc:xyz_zone.test_table_yyyy')\
    .option("temporaryGcsBucket","abcd-xml-abc-warehouse")\
    .save(mode='append',path='gs://xxxxxxxxx/')

P.S. - I am using spark-bigquery connector to achieve my goal.

Please let me know in case anyone has faced the same issue.

Upvotes: 2

Views: 766

Answers (1)

David Rabinowitz
David Rabinowitz

Reputation: 30448

At the moment the spark-bigquery-connector does not support writing to an external table. Please create an issue and we will try to add it soon.

You can of course do it in two steps:

  • Write the JSON files to GCS.
  • Use the BigQuery API in order to create the external table.

Upvotes: 1

Related Questions