dong renyi
dong renyi

Reputation: 1

Tensorflow Object detection API, train your own model, Error : module 'sys' has no attribute 'maxint'

I am an ABAP programmer and learning the tensorflow object detection API just following the tutorial and using the Racoon dataset from Dat Tran(https://github.com/datitran/raccoon_dataset). The training can be performed on my own PC(python 3.6.3 and tensorflow 1.5.0), but slow. So I put it to the google cloud plantform. The job keep failing.

The training input looks like this.

"scaleTier": "CUSTOM",
"masterType": "standard_gpu",
"workerType": "standard_gpu",
"parameterServerType": "standard",
"workerCount": "9",
"parameterServerCount": "3",
"packageUris": [
"gs://racoon/train/packages/363569b954c446566b767aabfeb047adb0ed2f25f83248417e2667aac70d0790/object_detection-0.1.tar.gz",
"gs://racoon/train/packages/363569b954c446566b767aabfeb047adb0ed2f25f83248417e2667aac70d0790/slim-0.1.tar.gz"
],
"pythonModule": "object_detection.train",
"args": [
"--train_dir=gs://racoon/train",
"--pipeline_config_path=gs://racoon/data/ssd_mobilenet_v1_pets.config"
],
"region": "us-central1",
"runtimeVersion": "1.5",
"jobDir": "gs://racoon/train",
"pythonVersion": "3.5"

The training was execuated for almost 100 steps, but failed with error, the job log shows like this.

The replica worker 1 exited with a non-zero status of 1. 
Termination reason: Error. 
Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) 
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 167, in <module> tf.app.run() 
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 124, in run _sys.exit(main(argv)) 
File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 163, in main worker_job_name, is_chief, FLAGS.train_dir) 
File "/root/.local/lib/python3.5/site-packages/object_detection/trainer.py", line 360, in train saver=saver) 
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 758, in train sys.maxint)) AttributeError: module 'sys' has no attribute 'maxint' 
The replica worker 2 exited with a non-zero status of 1. 
Termination reason: Error. 
Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) 
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) 
File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 167, in <module> tf.app.run() 
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 124, in run _sys.exit(main(argv)) 
File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 163, in main worker_job_name, is_chief, FLAGS.train_dir) 
File "/root/.local/lib/python3.5/site-packages/object_detection/trainer.py", line 360, in train saver=saver) 
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 758, in train sys.maxint)) AttributeError: module 'sys' has no attribute 'maxint' 
The replica worker 4 exited with a non-zero status of 1. 
Termination reason: Error. 
Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) 
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) 
File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 167, in <module> tf.app.run() 
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 124, in run _sys.exit(main(argv)) 
File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 163, in main worker_job_name, is_chief, FLAGS.train_dir) 
File "/root/.local/lib/python3.5/site-packages/object_detection/trainer.py", line 360, in train saver=saver)  
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 758, in train sys.maxint)) AttributeError: module 'sys' has no attribute 'maxint' 
To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=1006195729918&resource=ml_job%2Fjob_id%2Fracoon_object_detection_9&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22racoon_object_detection_9%22

In the local tensorflow install, the learning.py do have the sys.maxint, and the IDE shows syntax error. Does anyone face the same issue and have the solution? Please share with us. Thank you very much.

Upvotes: 0

Views: 276

Answers (2)

Guoqing Xu
Guoqing Xu

Reputation: 482

TensorFlow object detection API only supports TensorFlow 1.2 for now, so you need to change the runtime version to 1.2.

Upvotes: 0

Caner
Caner

Reputation: 59150

In python 3.0 sys.maxint is removed, so replace it with sys.maxsize:

The sys.maxint constant was removed, since there is no longer a limit to the value of integers. However, sys.maxsize can be used as an integer larger than any practical list or string index. It conforms to the implementation’s “natural” integer size and is typically the same as sys.maxint in previous releases on the same platform (assuming the same build options).

But this doesn't make sense to me that it works on your local machine.

Upvotes: 1

Related Questions