jackrabbit
jackrabbit

Reputation: 87

Move an entire bucket from google cloud storage to BigQuery?

From Google Cloud Storage to BigQuery transfers documentation which I have been reading, I can see how I can load data files one-by-one.

Is there a way to add an entire bucket into BigQuery?

The folders and files are logs for an organization. We're looking to create visualizations based off those using them. But first we need to get the bucket data into BigQuery...

Bucket Structure is as follows:

BucketName -> LogDate (ex. 20180623) -> all individual logs

Any ideas on how I can do this?

Upvotes: 1

Views: 958

Answers (2)

saifuddin778
saifuddin778

Reputation: 7277

You can load those nested logs iteratively. For instance, if your logs are in CSV format, having three fields in them:

gsutil ls gs://mybucket/* | grep '.csv' | xargs -I {} bq --location=US load --source_format=CSV mydataset.mytable {} field_a:type_field_a, field_b:type_field_b, field_c:type_field_c

Here, note how the schema is specified inline in the format of field_[x]:type_field_[x] where type can be any column type supported by BQ.

Upvotes: 1

Tamir Klein
Tamir Klein

Reputation: 3632

You can use a wildcard in your load process to achieve what you are looking to do as described in this link

The relevant part of the documentation is this:

For example, if you have two files named fed-sample000001.csv and fed-sample000002.csv, the bucket URI would be gs://mybucket/fed-sample*. This wildcard URI can then be used in the console, the classic UI, the CLI, or the API.

Upvotes: 1

Related Questions