Reputation: 3216
I know tf.python_io.TFRecordWriter
has a concept of GCS, but it doesn't seem to have permissions to write to it.
If I do the following:
output_path = 'gs://my-bucket-name/{}/{}.tfrecord'.format(object_name, record_name)
writer = tf.python_io.TFRecordWriter(output_path)
# write to writer
writer.close()
then I get 401s saying "Anonymous caller does not have storage.objects.create access to my-bucket-name."
However, on the same machine, if I do gsutil rsync -d r gs://my-bucket-name bucket-backup
, it properly syncs it, so I've authenticated properly using gcloud.
How can I give TFRecordWriter
permissions to write to GCS? I'm going to just use Google's GCP python API for now, but I'm sure there's a way to do this using TF alone.
Upvotes: 4
Views: 1842
Reputation: 196
Do note that export command won't work in a jupyter notebook. if you're in a jupyter notebook, this should work
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/json'
Upvotes: 0
Reputation: 81336
A common strategy to setup credentials on systems is to use Application Default Credentials (ADC). ADC is a strategy to locate Google Cloud Service Account credentials.
If the environment variable GOOGLE_APPLICATION_CREDENTIALS
is set, ADC will use the filename that the variable points to for service account credentials. This file is a Google Cloud Service Account credentials file in Json format. The previous P12 (PFX) certificates are deprecated.
If the environment variable is not set, the default service account is used for credentials if the application running on Compute Engine, App Engine, Kubernetes Engine or Cloud Functions.
If the previous two steps fail to find valid credentials, ADC will fail, and an error occurs.
For this questions, ADC could not find credentials and the TensorFlow writes to GCS failed.
The solution is to set the environment variable GOOGLE_APPLICATION_CREDENTIALS
to point to the service account Json file.
For Linux:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
For Windows
set GOOGLE_APPLICATION_CREDENTIALS=C:\path\to\service-account.json
I wrote an article that goes into more detail on ADC.
Google Cloud Application Default Credentials
Upvotes: 4
Reputation: 1301
When you use the gsutil command, you are using the GCP user configured in Cloud SDK (execute:gcloud config list
to see).
Plausibly your python script is not authenticated in GCP.
I believe that has a better approach to solve this (sorry I don't have a lot of knowledge about TensorFlow), but I can see 2 workarounds to fix that:
First option - Mount Cloud Storage buckets as file systems using Cloud Fuse
Second option - Write locally and move later. In this approach, you can use this code:
# Service Account file
JSON_FILE_NAME = '<Service account json file>'
# Imports the Google Cloud client library
from google.cloud import storage
# Instantiates a client
storage_client = storage.Client.from_service_account_json(JSON_FILE_NAME)
#Example file (using the service account)
source_file_path = 'your file path'
destination_blob_name = 'name of file in gcs'
# The name for the new bucket
bucket_name = '<bucket_name>'
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_path)
print('File {} uploaded to {}.'.format(
source_file_path,
destination_blob_name))
Upvotes: 1