Reputation: 1050

Scaling of Google Cloud ML Engine with online predictions. How to measure the node utilization?

I have an Inception V3 Model with some input and output modification deployed to the Google Cloud ML Engine for online predictions. During a week or so I had relatively few sparse requests (around 130) with median latency around 100ms and 95% percentile 2000ms. I have already generated around 2 node*hours. The minimum amount of nodes is set to 0. This is the first time when I want to use Cloud ML Engine in production.

The questions:

I know the nodes are up several minutes after the request. But how can I estimate the amount of requests, say per 1 minute, that will cause the scaling of the system? There seems to be no information on the CPU usage of nodes.

In my case I assume that the amount of requests will grow steadily. Should I expect node*hours to reach approximately 30*24 (amount of days time hours in month), then saturate at this value for some time, and then go further when CPU utilization of prediction nodes reaches, say 70%?

Upvotes: 0

Answers (2)

Bhupesh

Reputation: 209

We do publish request level logs on Stackdriver. You can turn them on by creating a model with online_prediction_logging = True. In those logs, we have a field called loading_request which can tell you if this request landed on a new machine. For a given shorter time period, this can give you a rough estimate on how many nodes were brought up. For more accurate node scale up, the feature that rhaertel80 suggested should help.

Upvotes: 0

rhaertel80

Reputation: 8389

You will soon be able to monitor the number of nodes in use, but you can't do so yet. You can do a quick and dirty estimate based on your mean qps and latency. Assume approximately 60% utilization, then:

X qps * .2 secs/query / .6

Upvotes: 0

Scaling of Google Cloud ML Engine with online predictions. How to measure the node utilization?

Answers (2)

Related Questions