Fabulini
Fabulini

Reputation: 171

Is there any way to create multiple tables in BigQuery at once?

I have a lot of json files in my bucket in GCS and I need to create a table for each one.

Normally, I do it manually in BigQuery: selecting the format (json), giving it a name and using automatically detected schema.

Is there any way of creating multiple tables at once using data from GCS?

Upvotes: 0

Views: 640

Answers (2)

Cylldby
Cylldby

Reputation: 1978

In this answer you have a solution to recursively go through your bucket and load csv files to BQ. You can adapt this code with for instance:

gsutil ls gs://mybucket/**.json | \
    xargs -I{} echo {} | \
    awk '{n=split($1,A,"/"); q=split(A[n],B,"."); print "mydataset."B[1]" "$0}' | \
    xargs -I{} sh -c 'bq --location=YOUR_LOCATION load --replace=false --autodetect --source_format=NEWLINE_DELIMITED_JSON {}'

This is if you want to run a load job in parallel manually.

If you want to add automation, you can use workflows as @Pentium10 recommends, or plug the Bash command into a Cloud Run instance coupled with a Scheduler for instance (you can look at this repo for inspiration)

Upvotes: 0

Pentium10
Pentium10

Reputation: 207982

Disclaimer: I have a blogpost authored on this topic at https://medium.com/p/54228d166a7d

Essentially you can leverage Cloud Workflows, to automate this process.

enter image description here

a sample workflow would be:

ProcessItem:
  params: [project, gcsPath]
  steps:
    - initialize:
        assign:
          - dataset: wf_samples
          - input: ${gcsPath}
# omitted parts for simplicity 
    - runLoadJob:
        call: BQJobsInsertLoadJob_FromGCS
        args:
          project: ${project}
          configuration:
            jobType: LOAD
            load:
              sourceUris: ${gcsPath}
              schema:
                fields:
                  - name: "mydate"
                    type: "TIMESTAMP"
                  - name: "col1"
                    type: "FLOAT"
                  - name: "col2"
                    type: "FLOAT"
              destinationTable:
                projectId: ${project}
                datasetId: ${dataset}
                tableId: ${"table_"+output.index}
        result: loadJobResult
    - final:
        return: ${loadJobResult}
BQJobsInsertLoadJob_FromGCS:
  params: [project, configuration]
  steps:
    - runJob:
        call: http.post
        args:
          url: ${"https://bigquery.googleapis.com/bigquery/v2/projects/"+project+"/jobs"}
          auth:
            type: OAuth2
          body:
            configuration: ${configuration}
        result: queryResult
        next: queryCompleted
    - queryCompleted:
        return: ${queryResult.body}

Upvotes: 1

Related Questions