Reputation: 87
From Google Cloud Storage to BigQuery transfers documentation which I have been reading, I can see how I can load data files one-by-one.
Is there a way to add an entire bucket into BigQuery?
The folders and files are logs for an organization. We're looking to create visualizations based off those using them. But first we need to get the bucket data into BigQuery...
Bucket Structure is as follows:
BucketName -> LogDate (ex. 20180623) -> all individual logs
Any ideas on how I can do this?
Upvotes: 1
Views: 958
Reputation: 7277
You can load
those nested logs iteratively. For instance, if your logs are in CSV format, having three fields in them:
gsutil ls gs://mybucket/* | grep '.csv' | xargs -I {} bq --location=US load --source_format=CSV mydataset.mytable {} field_a:type_field_a, field_b:type_field_b, field_c:type_field_c
Here, note how the schema is specified inline in the format of field_[x]:type_field_[x]
where type can be any column type supported by BQ.
Upvotes: 1
Reputation: 3632
You can use a wildcard in your load process to achieve what you are looking to do as described in this link
The relevant part of the documentation is this:
For example, if you have two files named fed-sample000001.csv and fed-sample000002.csv, the bucket URI would be gs://mybucket/fed-sample*. This wildcard URI can then be used in the console, the classic UI, the CLI, or the API.
Upvotes: 1