Evan Hecht
Evan Hecht

Reputation: 11

Trying to run a Cloud Function with LRO

Background I am working on creating an autonomous Google AutoML end<>end system. I created a cloud function that receives a cloud pub/sub message when training starts. The cloud function uses the operation ID to get the operation status of the training. If the training of the model is complete(operation metadata = true), the function will send the model ID to a deployment function and send a pub/sub message with the modelID for the model to be called on prediction from. I found a solution from SO from this post How to programmatically get model id from google-cloud-automl with node.js client library

Problem The issue I am coming across is with the cloud function timeout of 10 minutes. I wrote this question on reddit on potential solutions. https://www.reddit.com/r/googlecloud/comments/jqr213/cloud_function_to_compute_engine/ The Compute Engine solution seems not practical for a system mainly written in a cloud function environment. While trying to implement the cron job solution, I thought of the retry feature for cloud functions. It keeps the same event and will retry the function for up to a week. The documentation for retry is https://cloud.google.com/functions/docs/bestpractices/retries How could I include a cancel of the function to keep it retrying until it becomes true and completes the deployment and pub/sub message? My thought is to include the ending of the system in the if else statement, I am just struggling to find documentation of this/ if it would actually work.

Code

const {AutoMlClient} = require('@google-cloud/automl').v1;
// Instantiates a client
  const client = new AutoMlClient();
exports.helloPubSub = (event, context) => {
//Imports the Google Cloud AutoML library
  const message = event.data
    ? Buffer.from(event.data, 'base64').toString()
    : 'Hello, World';
  const model = message;
  console.log(model);
  const modelpath = message.replace('"','');
  const modelID = modelpath.replace('"','');
  const message1 = model.replace('projects/170974376642/locations/us-central1/operations/','');
  const message2 = message1.replace('"','');
  const message3 = message2.replace('"','');
  console.log(`Operation ID is: ${message3}`)
  getOperationStatus(message3, modelID);
  
}
  // [START automl_vision_classification_deploy_model_node_count]
async function getOperationStatus(opId, message) {
  
  console.log('Starting operation status');
  const opped = opId;
  const data = message; 
  const projectId = '170974376642';
  const location = 'us-central1';
  const operationId = opId;
  // Construct request
  const request = {
    name: `${message}`,
  };
  console.log('Made it to the response');
  const [response] = await client.operationsClient.getOperation(request);

  console.log(`Name: ${response.name}`);
  console.log(`Operation details:`);
  
  var apple = JSON.stringify(response);
  console.log(apple);
  
  console.log('Loop until the model is ready to deploy');

  if (apple.includes('True')) { 
     const appleF = apple.replace((/projects\/[a-zA-Z0-9-]*\/locations\/[a-zA-Z0-9-]*\/models\//,''));
     deployModelWithNodeCount(appleF);
     pubSub(appleF);
} else {
     getOperationStatus(opped, data);
}
  

}
  async function pubSub(id) {
    const topicName = 'modelID';
    const data = JSON.stringify({foo: `${id}`});
    async function publishMessage() {
    // Publishes the message as a string, e.g. "Hello, world!" or JSON.stringify(someObject)
    const dataBuffer = Buffer.from(data);

    try {
      const messageId = await pubSubClient.topic(topicName).publish(dataBuffer);
      console.log(`Message ${messageId} published.`);
    } catch (error) {
      console.error(`Received error while publishing: ${error.message}`);
      process.exitCode = 1;
    }
  }
    publishMessage();
  // [END pubsub_publish_with_error_handler]
  // [END pubsub_quickstart_publisher]


    process.on('unhandledRejection', err => {
    console.error(err.message);
    process.exitCode = 1;
});
  }
  async function deployModelWithNodeCount(message) {
    
    
    
    const projectId = 'ireda1';
    const location = 'us-central1';
    const modelId = message;

    // Construct request
    const request = {
      name: client.modelPath(projectId, location, modelId),
      imageClassificationModelDeploymentMetadata: {
        nodeCount: 1,
      },
    };

    const [operation] = await client.deployModel(request);

    // Wait for operation to complete.
    const [response] = await operation.promise();
    console.log(`Model deployment finished. ${response}`);
  }
  // [END automl_vision_classification_deploy_model_node_count]

Upvotes: 1

Views: 177

Answers (1)

NeatNerd
NeatNerd

Reputation: 2373

There are several improvements that you can consider for your code. First of all, it is important to understand that Cloud Functions are short-lived. 9 minutes is the maximum, your function will be active. Cloud Functions are not meant for background operations, if you are looking at a solution, which can be executed in the background and requires minimal infrastructure, I would recommend having a look at Cloud Run.

Now lets have a look at some parts of the code and how it can be improved with a different architecture maintaining Cloud Functions and PubSub as the backbone.

Waiting on model deployment

The code you use is:

  if (apple.includes('True')) { 
     const appleF = apple.replace((/projects\/[a-zA-Z0-9-]*\/locations\/[a-zA-Z0-9-]*\/models\//,''));
     deployModelWithNodeCount(appleF);
     pubSub(appleF);
} else {
     getOperationStatus(opped, data);
}

First of all, I would strongly suggest not to use recursion here, because a) this can be handled via a simple loop, b) you are bombarding the service without any time out or back-off policy. The latter might result in either your service crashing or endpoint starting to reject your requests.

To improve your code, you can for example set at least timeout function, like this:

setTimeout(getOperationStatus(opped, data), 1000)

For readability, I would also suggest just to use a loop in the future since you are using async patterns anyways:

status = getOperationStatus(opped, data);
while(!status){
 await new Promise(t => setTimeout(t, 1000));
 status = getOperationStatus(opped, data);
} 

In this case, you need to separate it into two functions - 1) getOperationStatus, which actually just return status, and 2) waitForDeployment, which polls for the status, compares it with the expected result, and decides to a) wait & retry or b) abandon & return

This might make your code better, but does not solve the fundamental problem of the system design. To understand this, let's have a look a splitting responsibility and structuring the system differently. As a side note, the guide here is not meant for a Cloud Function application. enter image description here

A few explanations:

  • Activation Function initializes the entire process, it calls the Vision Auto ML to start the deployment. It only gets the ID of the operation and pushes it to the queue
  • Cloud Scheduler pushes a trigger to PubSub (alternatively it can also call the function as an endpoint) every X minutes/seconds saying that it is time to check on the progress
  • Polling Function once triggered ask for the next ID to check, queries Cloud AutoML and if finished, acknowledges the message and writes the results, otherwise exits. You need to be careful with the configuration of acknowledgments here. Useful information is here

Polling of the status The minor thing I have noticed is how you are polling the status. Why don't your just query this URL GET https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id and get status of done (check here for details)

Conclusion: Cloud Functions are short-lived and must handle only one operation at a time, no waiting. If you want a simple loop for waiting for results, use Cloud Run

Upvotes: 2

Related Questions