Raj Kumar N
Raj Kumar N

Reputation: 793

Google cloud AI platform error in executing job

Using python googleapiclient API we are creating a job in the AI platform.

from oauth2client.client import GoogleCredentials
import datetime

credentials = GoogleCredentials.get_application_default()
training_inputs = {'scaleTier':'CUSTOM','masterType':'complex_model_m',
        'packageUris':['package_bucket_file_path'],

        'pythonModule':'randomforest_trainer_RUL.train',
        'args':[
                '--trainFilePath', data[0],
                '--trainOutputPath', data[2],
                '--testFilePath', data[1],
                '--testOutputPath', data[3],
                '--target', target_label,
                '--bucket', BUCKET,
                '--expid', experiment_id
        ],
        'region': "region_of_bucket",
        'runtimeVersion':'1.14',
        'pythonVersion':'3.5'}

timestamp = datetime.datetime.now().strftime('%y%m%d_%H%M%S%f')
job_name = "job_"+experiment_id

## logging information
logging.info("Job Name:{}".format(job_name))
##
api = discovery.build('ml', 'v1', credentials=credentials,cache_discovery=False)

project_id = 'projects/{}'.format(PROJECT)
credentials  = GoogleCredentials.get_application_default()
request = api.projects().jobs().create(body=job_spec, parent=project_id)

It was working and I am able to train the model, did the testing and prediction till yesterday. But all of sudden I'm not able to train the model in AI Platform and the error that I'm getting is

The replica master 0 exited with a non-zero status of 1. \nTraceback (most recent call last):\n  [...]\n  
    File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", line 810, in ls\n    
    combined_listing = self._ls(path, detail) + self._ls(path + "/", detail)\n  
    File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-12>", line 2, in _ls\n  
    File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", line 50, in _tracemethod\n    
    return f(self, *args, **kwargs)\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 820, in _ls\n    listing = self._list_objects(path)\n  File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-5>", 
    line 2, in _list_objects\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 50, in _tracemethod\nreturn f(self, *args, **kwargs)\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 616, in _list_objects\n    listing = self._do_list_objects(path)\n  File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-6>", 
    line 2, in _do_list_objects\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 50, in _tracemethod\n    return f(self, *args, **kwargs)\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 637, in _do_list_objects\n    maxResults=max_results,\n  File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-2>", 
    line 2, in _call\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 50, in _tracemethod\n    return f(self, *args, **kwargs)\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 517, in _call\n    validate_response(r, path)\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 171, in validate_response\n    raise IOError("Forbidden: %s\\n%s" % (path, msg))\nOSError: 
    Forbidden: https://www.googleapis.com/storage/v1/b/some-storage-bucket/o/\[email protected] 
    does not have serviceusage.services.use access to project 34XX12XX12X.\n\nTo find out more about why your job exited 
    please check the logs: https://console.cloud.google.com/logs/viewer?project=87XX90XX1XX&resource=ml_job%2Fjob_id%2Fjob_5de3592da3c3c541d73389er&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22job_5de3592da3c3c541d73389erce%22

The error that I'm getting is

[email protected] 
    does not have serviceusage.services.use access to project 34XX12XX12X

Upvotes: 2

Views: 437

Answers (1)

Madhi
Madhi

Reputation: 1236

Had the exact problem today. As Nick said, it's the GCSFS new release problem. Instead of using pd.read_csv(gcs_path) I suggest you to read the CSV file from the bucket directly by Tensorflow GFile Function.

with tf.gfile.GFile(gcs_path) as f:
            if(opts):
                df = pd.read_csv(f, opts)
            else:
                df = pd.read_csv(f)
        return df

It will allow you to run the job without any breaking.

Upvotes: 3

Related Questions