Reputation: 171
I have a lot of json files in my bucket in GCS and I need to create a table for each one.
Normally, I do it manually in BigQuery: selecting the format (json), giving it a name and using automatically detected schema.
Is there any way of creating multiple tables at once using data from GCS?
Upvotes: 0
Views: 640
Reputation: 1978
In this answer you have a solution to recursively go through your bucket and load csv files to BQ. You can adapt this code with for instance:
gsutil ls gs://mybucket/**.json | \
xargs -I{} echo {} | \
awk '{n=split($1,A,"/"); q=split(A[n],B,"."); print "mydataset."B[1]" "$0}' | \
xargs -I{} sh -c 'bq --location=YOUR_LOCATION load --replace=false --autodetect --source_format=NEWLINE_DELIMITED_JSON {}'
This is if you want to run a load job in parallel manually.
If you want to add automation, you can use workflows as @Pentium10 recommends, or plug the Bash command into a Cloud Run instance coupled with a Scheduler for instance (you can look at this repo for inspiration)
Upvotes: 0
Reputation: 207982
Disclaimer: I have a blogpost authored on this topic at https://medium.com/p/54228d166a7d
Essentially you can leverage Cloud Workflows, to automate this process.
a sample workflow would be:
ProcessItem:
params: [project, gcsPath]
steps:
- initialize:
assign:
- dataset: wf_samples
- input: ${gcsPath}
# omitted parts for simplicity
- runLoadJob:
call: BQJobsInsertLoadJob_FromGCS
args:
project: ${project}
configuration:
jobType: LOAD
load:
sourceUris: ${gcsPath}
schema:
fields:
- name: "mydate"
type: "TIMESTAMP"
- name: "col1"
type: "FLOAT"
- name: "col2"
type: "FLOAT"
destinationTable:
projectId: ${project}
datasetId: ${dataset}
tableId: ${"table_"+output.index}
result: loadJobResult
- final:
return: ${loadJobResult}
BQJobsInsertLoadJob_FromGCS:
params: [project, configuration]
steps:
- runJob:
call: http.post
args:
url: ${"https://bigquery.googleapis.com/bigquery/v2/projects/"+project+"/jobs"}
auth:
type: OAuth2
body:
configuration: ${configuration}
result: queryResult
next: queryCompleted
- queryCompleted:
return: ${queryResult.body}
Upvotes: 1