Reputation: 4166
I submitted a training job to Cloud ML Engine but it failed with an out-of-memory error. How can I specify more memory for my job?
Upvotes: 0
Views: 335
Reputation: 4166
If you don't specify --scale-tier in your Cloud ML Engine job, you are using BASIC which is a single CPU machine with 4 GB of memory.
To use a 8-CPU machine that has 52 GB of memory:
(1) Create a file named largemachine.yaml with this content
trainingInput:
scaleTier: CUSTOM
masterType: large_model
(2) Add this to your ml-engine job submission:
gcloud ml-engine jobs submit training $JOB_NAME \
...
--scale-tier=CUSTOM \
--config=largemachine.yaml \
-- \
...
See this page for other machine types (including GPU types) you can use: https://cloud.google.com/ml-engine/docs/tensorflow/machine-types#compare-machine-types
Upvotes: 2