stkvtflw
stkvtflw

Reputation: 13507

How to trigger data upload from Cloud Storage to BigQuery from Kubernetes Engine?

An api server is running on Kubernetes Engine (GKE). Users can upload relatively small sets of data (~100mb, multiple .csv with the same data structure) from client applications to Cloud Storage (GCS). Once upload is complete, i need to import all data from all new .csv files to a single existing BigQuery table with some user-specific params (mark each row with user id may be or so). Order doesn't matter.

Google docs are offering GUI-based solutions and command line solutions for this. Though, i assume, there is a way to trigger upload and track it's progress from the GKE-based server itself. How do i do that?

Not sure if this is important: GKE api server is written on NodeJS.

Upvotes: 0

Views: 938

Answers (1)

Elliott Brossard
Elliott Brossard

Reputation: 33705

Here is an example of uploading a file to GCS, taken from the BigQuery documentation. You can configure the job as you need; there are a few references on that page and a link to the GitHub repo with additional functionality:

// Imports the Google Cloud client libraries
const BigQuery = require('@google-cloud/bigquery');
const Storage = require('@google-cloud/storage');

// The project ID to use, e.g. "your-project-id"
// const projectId = "your-project-id";

// The ID of the dataset of the table into which data should be imported, e.g. "my_dataset"
// const datasetId = "my_dataset";

// The ID of the table into which data should be imported, e.g. "my_table"
// const tableId = "my_table";

// The name of the Google Cloud Storage bucket where the file is located, e.g. "my-bucket"
// const bucketName = "my-bucket";

// The name of the file from which data should be imported, e.g. "file.csv"
// const filename = "file.csv";

// Instantiates clients
const bigquery = BigQuery({
  projectId: projectId
});

const storage = Storage({
  projectId: projectId
});

let job;

// Imports data from a Google Cloud Storage file into the table
bigquery
  .dataset(datasetId)
  .table(tableId)
  .import(storage.bucket(bucketName).file(filename))
  .then((results) => {
    job = results[0];
    console.log(`Job ${job.id} started.`);

    // Wait for the job to finish
    return job.promise();
  })
  .then((results) => {
    // Get the job's status
    return job.getMetadata();
  }).then((metadata) => {
    // Check the job's status for errors
    const errors = metadata[0].status.errors;
    if (errors && errors.length > 0) {
      throw errors;
    }
  }).then(() => {
    console.log(`Job ${job.id} completed.`);
  })
  .catch((err) => {
    console.error('ERROR:', err);
  });

After uploading, you can run a query that queries the newly uploaded CSV file(s) and appends the result to the desired destination table.

Upvotes: 1

Related Questions