Reputation: 793
Using python googleapiclient
API we are creating a job in the AI platform.
from oauth2client.client import GoogleCredentials
import datetime
credentials = GoogleCredentials.get_application_default()
training_inputs = {'scaleTier':'CUSTOM','masterType':'complex_model_m',
'packageUris':['package_bucket_file_path'],
'pythonModule':'randomforest_trainer_RUL.train',
'args':[
'--trainFilePath', data[0],
'--trainOutputPath', data[2],
'--testFilePath', data[1],
'--testOutputPath', data[3],
'--target', target_label,
'--bucket', BUCKET,
'--expid', experiment_id
],
'region': "region_of_bucket",
'runtimeVersion':'1.14',
'pythonVersion':'3.5'}
timestamp = datetime.datetime.now().strftime('%y%m%d_%H%M%S%f')
job_name = "job_"+experiment_id
## logging information
logging.info("Job Name:{}".format(job_name))
##
api = discovery.build('ml', 'v1', credentials=credentials,cache_discovery=False)
project_id = 'projects/{}'.format(PROJECT)
credentials = GoogleCredentials.get_application_default()
request = api.projects().jobs().create(body=job_spec, parent=project_id)
It was working and I am able to train the model, did the testing and prediction till yesterday. But all of sudden I'm not able to train the model in AI Platform and the error that I'm getting is
The replica master 0 exited with a non-zero status of 1. \nTraceback (most recent call last):\n [...]\n
File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", line 810, in ls\n
combined_listing = self._ls(path, detail) + self._ls(path + "/", detail)\n
File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-12>", line 2, in _ls\n
File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", line 50, in _tracemethod\n
return f(self, *args, **kwargs)\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 820, in _ls\n listing = self._list_objects(path)\n File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-5>",
line 2, in _list_objects\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 50, in _tracemethod\nreturn f(self, *args, **kwargs)\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 616, in _list_objects\n listing = self._do_list_objects(path)\n File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-6>",
line 2, in _do_list_objects\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 50, in _tracemethod\n return f(self, *args, **kwargs)\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 637, in _do_list_objects\n maxResults=max_results,\n File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-2>",
line 2, in _call\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 50, in _tracemethod\n return f(self, *args, **kwargs)\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 517, in _call\n validate_response(r, path)\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 171, in validate_response\n raise IOError("Forbidden: %s\\n%s" % (path, msg))\nOSError:
Forbidden: https://www.googleapis.com/storage/v1/b/some-storage-bucket/o/\[email protected]
does not have serviceusage.services.use access to project 34XX12XX12X.\n\nTo find out more about why your job exited
please check the logs: https://console.cloud.google.com/logs/viewer?project=87XX90XX1XX&resource=ml_job%2Fjob_id%2Fjob_5de3592da3c3c541d73389er&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22job_5de3592da3c3c541d73389erce%22
The error that I'm getting is
[email protected]
does not have serviceusage.services.use access to project 34XX12XX12X
Upvotes: 2
Views: 437
Reputation: 1236
Had the exact problem today. As Nick said, it's the GCSFS new release problem. Instead of using pd.read_csv(gcs_path)
I suggest you to read the CSV file from the bucket directly by Tensorflow GFile Function.
with tf.gfile.GFile(gcs_path) as f:
if(opts):
df = pd.read_csv(f, opts)
else:
df = pd.read_csv(f)
return df
It will allow you to run the job without any breaking.
Upvotes: 3