Anonymous Duck
Anonymous Duck

Reputation: 2978

Cloud storage build file in chunk

Looking for an example on how to build the file in cloud storage dynamically. Below is my use case:

  1. Java application will query big query for data
  2. Using pagination in big query, data will be pulled by page window
  3. After having the data from BQ, will persist each chunk in cloud storage.
  4. After all chunks have been uploaded, complete file upload.

The challenge in here is cloud storage file is immutable so once you have created the object in GCS, you can no longer reopen it unless you overwrite the same file. Tried to explore using streaming and resumable upload feature and based on my understanding it needs the file to be ready prior to uploading.

If this is not possible, my only option now is to upload each chunk as different file and use cloud storage compose feature to merge these chunks into a one file. This is very costly given that you need to create multiple request to GCS just to complete one file.

Upvotes: 0

Views: 307

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 75775

If your final file format is CSV, JSONL (line), AVRO or Parquet, you can use the table export feature. only one file will be generated if you export less than 1Gb.

  • Java application query BigQuery and sink the result in a temporary table
CREATE TABLE `myproject.mydataset.mytemptable`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
) AS
SELECT ....

That's all.

Upvotes: 1

Related Questions