Reputation: 2533
I've built a trainer and when I submit the job, the job starts and logs get populated. But none of my output to stdout ever appears in the log. I do get messages like "The TensorFlow library wasns't compiled to use AVX2 instructions..."
The entire job takes about 5 to 10 minutes on my laptop; I let it run for over an hour on the cloud server and still never saw any output (and the first line of output occurs almost immediately when I run it locally.)
I can run my job locally by invoking it directly, but I haven't been able to get it to run using the "gcloud local" command... when I do this, I get an error "No module named tensorflow"
Upvotes: 0
Views: 378
Reputation: 6776
The log message "The TensorFlow library wasn't compiled to use AVX2 instructions" indicates that log messages are flowing from TensorFlow to Cloud Logging. So most likely there is a problem with the way you have configured logging and as a result log messages aren't being correctly written to stderr/stdout.
This easiest way to debug this would be to create a simple example to try to reproduce this error.
I'd suggest creating a simply python program that does nothing but log a message and then submitting that to the service to see if a log message is printed.
Something like the following
import logging
import time
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
# Output logs for 5 minutes. We do this for 5 minutes just to ensure
# the job doesn't terminate before logs can be flushed.
for i in range(30):
logging.info("This is an info message.")
logging.error("This is an error message.")
time.sleep(10)
For the issue importing TensorFlow when running locally please take a look at this SO Question which has some suggestions on how to check the Python path used by gcloud and verifying that it includes TensorFlow.
Upvotes: 1