Brian Hanechak
Brian Hanechak

Reputation: 2533

Running Google Cloud ML training job but getting no stdout output in logs

I've built a trainer and when I submit the job, the job starts and logs get populated. But none of my output to stdout ever appears in the log. I do get messages like "The TensorFlow library wasns't compiled to use AVX2 instructions..."

The entire job takes about 5 to 10 minutes on my laptop; I let it run for over an hour on the cloud server and still never saw any output (and the first line of output occurs almost immediately when I run it locally.)

I can run my job locally by invoking it directly, but I haven't been able to get it to run using the "gcloud local" command... when I do this, I get an error "No module named tensorflow"

Upvotes: 0

Views: 378

Answers (1)

Jeremy Lewi
Jeremy Lewi

Reputation: 6776

The log message "The TensorFlow library wasn't compiled to use AVX2 instructions" indicates that log messages are flowing from TensorFlow to Cloud Logging. So most likely there is a problem with the way you have configured logging and as a result log messages aren't being correctly written to stderr/stdout.

This easiest way to debug this would be to create a simple example to try to reproduce this error.

I'd suggest creating a simply python program that does nothing but log a message and then submitting that to the service to see if a log message is printed.

Something like the following

import logging
import time
if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    # Output logs for 5 minutes. We do this for 5 minutes just to ensure
    # the job doesn't terminate before logs can be flushed.
    for i in range(30):
       logging.info("This is an info message.")
       logging.error("This is an error message.")
       time.sleep(10)

For the issue importing TensorFlow when running locally please take a look at this SO Question which has some suggestions on how to check the Python path used by gcloud and verifying that it includes TensorFlow.

Upvotes: 1

Related Questions