Owenn
Owenn

Reputation: 1160

What is the best way to add new data to BigQuery through BigQuery API?

I'm using Django as my backend framework to connect my web app with BigQuery. How I would do it is to use BigQuery API in views.py to fetch data from BQ. So far from my research, I found 2 ways I can add data to BQ from my Django:

  1. Using the insert_rows_json() method where I would just need to have the data in a JSON format and it would append the data to the BQ.
  2. Using the to_gbq() method where I would need the data to be in a Pandas DataFrame and I could include the parameter if_exists="replace" to update existing tables on the BQ.

Currently, for adding new data, I would use method 1 and for other operations such as updating and deleting, I would use method 2.

My question: Is it better if I use method 2 for all of my operations, or should I just stick to using method 1 for adding new data and method 2 for other operations?

OR PERHAPS is there any other way that is more efficient for the web app to run even faster?

Upvotes: 0

Views: 1476

Answers (1)

Vishal K
Vishal K

Reputation: 1464

Quoted from this doc:

For new projects, we recommend using the BigQuery Storage Write API instead of the tabledata.insertAll method. The Storage Write API has lower pricing and more robust features, including exactly-once delivery semantics. The tabledata.insertAll method is still fully supported.

  • You can try BigQuery Storage Write API instead of the legacy insert_rows_json() method for streaming data into BigQuery. It has lower pricing and more robust features, including exactly-once delivery semantics. If you still need to use the legacy streaming insert_rows_json() method, you can use it. It is still fully supported by Google Cloud.

  • Use the insert_rows_json() method for streaming data into BigQuery because that is a recommended method and maintained by Google Cloud.

  • You can also UPDATE and DELETE table data using DML queries via BigQuery client libraries. But, there are some limitations in BigQuery when doing UPDATE and DELETE queries immediately after streaming inserts.

Rows that were written to a table recently by using streaming (the tabledata.insertall method or the Storage Write API) cannot be modified with UPDATE, DELETE, or MERGE statements. The recent writes are those that occur within the last 30 minutes. All other rows in the table remain modifiable by using UPDATE, DELETE, or MERGE statements. The streamed data can take up to 90 minutes to become available for copy operations.

  • If you still want to use the to_gbq() method for updating and deleting the table, you can use it. Refer here you can find the difference between the pandas-gbq and google-cloud-bigquery libraries.

Upvotes: 1

Related Questions