Joseph N
Joseph N

Reputation: 560

Bigquery table with GCS data source does not affect data changed into gcs

I am pretty new in bigquery. I have created bigquery table from gcp console where GCS CSV file is used as data source. I think when i delete any row, that should also be deleted from GCS file. But practically it's not happening.

Upvotes: 0

Views: 1516

Answers (2)

rmesteves
rmesteves

Reputation: 4085

As you can see in the image below, BigQuery supports three types of tables: Native, External and Views

enter image description here

When you create a Native Table, your data is fully imported into BigQuery's storage system and transformed in order to be optimized for queries. An External Table is basically a pointer to your source files. In other words, every time you run a query against an External Table, BigQuery access the original source of data (some file in GCS, Google Driver, etc..)

Given that, I can go directly to your question: BigQuery will not update the source files when you run some DML statement. If you run a DML statement (DELETE, UPDATE) against a Native table, the data inside BigQuery's storage system will be changed but the files will not be touched.

Also, DML is not supported in External Tables. If you try to run a DELETE statement agains an External Table for example you will get an error: DML over table 'project.dataset.table' is not supported.

I strongly recommend that you take a look in this documentation

Upvotes: 1

guillaume blaquiere
guillaume blaquiere

Reputation: 76053

When you use BigQuery, you have 2 ways to load data from GCS CSV file.

  1. The most common, is to perform a load job. This means that your CSV data are loaded (copied) into a BigQuery native table. After the load, there no link maintained between the file and the BigQuery data.

In this case, it's normal that the file doesn't change when you delete data into BigQuery

  1. You can define external table and query directly the data into your file hosted in GCS. It prevents the data duplication but the query are slower. In addition, DML (Data Manipulation Language) statements (INSERT, UPDATE, DELETE) aren't supported on external table.

Workaround

As workaround, your can use the solution 1:

Upvotes: 1

Related Questions