Reputation: 135
I am loading a dataset from BigQuery and after some transformations, I'd like to save the transformed DataFrame back into BigQuery. Is there a way of doing this?
This is how I am loading the data:
df = spark.read \
.format('bigquery') \
.option('table', 'publicdata.samples.shakespeare') \
.load()
Some transformations:
df_new = df.select("word")
And this I how I am trying to save the data as a new table into my project area:
df_new \
.write \
.mode('overwrite') \
.format('bigquery') \
.save('my_project.some_schema.df_new_table')
Is this even possible? Is there a way to save to BQ directly?
ps: I know this works but this is not exactly what I am looking for:
df_new \
.write \
.mode('overwrite') \
.format('csv') \
.save('gs://my_bucket/df_new.csv')
Thanks!
Upvotes: 8
Views: 25628
Reputation: 1
As you mentioned, data will not be written to BigQuery directly. It will first write into Google Storage and then gets loaded to BigQuery. To achieve this, use the following statement before the write statement
bucket = "<give your bucket name"
spark.conf.set("temporaryGcsBucket",bucket)
wordCountDf.write.format('bigquery').option('table', 'projectname.dataset.table_name').save()
Upvotes: 0
Reputation: 1004
Here is the documentation for the BigQuery connector with Spark
This is how it's recommended:
# Saving the data to BigQuery
word_count.write.format('bigquery') \
.option('table', 'wordcount_dataset.wordcount_output') \
.save()
You set the table in the option() instead of the "save()"
Upvotes: 5
Reputation: 21
Following syntax will create/overite table
df.write.format('bigquery').option('table', ( 'project.db.tablename')).mode("overwrite").save()
For more information you can refer the following link https://dbmstutorials.com/pyspark/spark-dataframe-write-modes.html
Upvotes: 0