Hafizur Rahman
Hafizur Rahman

Reputation: 2372

How to provide authentication to Google Cloud remote instance for BigQuery?

I am working on making the data collection part to be done by remote machine. What does it mean that my the remote instance will use my python code to run a query into BigQuery and get the data. It will then use that data and start a ML job on it. All these steps can be done by a script in my local machine. But what is the best way of getting data through the cloud instance (remotely) in which my python module "trainer.task.py" is being executed?

Upvotes: 0

Views: 180

Answers (1)

Hafizur Rahman
Hafizur Rahman

Reputation: 2372

The idea is to use service account credentials in querying datafrom BigQuery as discussed in Authenticating With a Service Account Key File

The automated process can be handled through a bash script (let's call it "deploy_cloud.sh") that will contain all config params as follows:

export JOB_NAME="your_job_$(date +%Y%m%d_%H%M%S)"
export BUCKET_NAME=your_job
export CLOUD_FOLDER=test-folder
export JOB_DIR=$BUCKET_NAME/$CLOUD_FOLDER/$JOB_NAME
export CLOUD_SECRET_FOLDER=credentials
export SERVICE_ACC_SECRET=service_account_secret.json
# copy your service account credential to cloud storage
# you can do this step manually in your storage
gsutil cp -r path_to_your_service_account_file/$SERVICE_ACC_SECRET gs://$BUCKET_NAME/$CLOUD_FOLDER/$CLOUD_SECRET_FOLDER/$SERVICE_ACC_SECRET
# define your query params
export YOUR_QUERY_PARAM1="provide_param1_value"
export YOUR_QUERY_PARAM2="provide_param2_value"
export YOUR_QUERY_PARAM3="provide_param3_value"
export YOUR_QUERY_PARAM4="provide_param4_value"
# Then submit your training job 

gcloud ml-engine jobs submit training $JOB_NAME \
--job-dir $JOB_DIR \
--runtime-version 1.4 \
--module-name trainer.task \
--package-path ./trainer \
--region $REGION \
--config=trainer/config-gpu.yaml \
-- \
--client-sec gs://$BUCKET_NAME/$CLOUD_FOLDER/$CLOUD_SECRET_FOLDER/$SERVICE_ACC_SECRET \
--query-param1 $YOUR_QUERY_PARAM1 \
--query-param2 $YOUR_QUERY_PARAM2 \
--query-param3 $YOUR_QUERY_PARAM3 \
--query-param4 $YOUR_QUERY_PARAM4 

In your task.py file, you will require to use argument parser to use those passed values and build your query accordingly. In your task.py file, use the following code to read from your storage bucket and write it into cloud instance's local memory as follows:

def read_and_write_client_secret(input_file, output_file):
    with file_io.FileIO(input_file, 'r') as f:
        secret = json.load(f)

    with open(output_file, 'w') as outfile:
        json.dump(secret, outfile)

You can invoke the above function(in your task.py) as follows:

read_and_write_client_secret(args_dict['client_sec'], 'temp.json')

considering your argument parser stores all your params as a dictionary in args_dict. Note your service account file will be written as 'temp.json' in cloud instance current/working directory.

Now you need to specify the file location ('temp.json' which contains the credentials) when creating your BigQuery client as follows;

# my_query = build_my_query(args_dict['query_param1'], args_dict['query_param2'], args_dict['query_param3'], args_dict['query_param4'])

bigquery_client = bigquery.Client.from_service_account_json(
    'temp.json')
query_job = client.query(my_query)
results = query_job.result()  # Waits for job to complete.

# suggestion: extract data from iterator and create a pandas data frame for further processing

Now your cloud instance will be able to run your query and use that data. You are good to start a machine learning job on the data you have got.

Upvotes: 1

Related Questions