bw4sz
bw4sz

Reputation: 2257

GoogleCloud DataFlow Failed to write a file to temp location

I am building a beam pipeline on Google cloud dataflow.

I am getting an error that cloud dataflow does not have permissions to write to a temp directory.

enter image description here

This is confusing since clearly dataflow has the ability to write to the bucket, it created a staging folder.

enter image description here

Why would I be able to write a staging folder, but not a temp folder?

I am running from within a docker container on a compute engine. I am fully authenticated with my service account.

PROJECT=$(gcloud config list project --format "value(core.project)")
BUCKET=gs://$PROJECT-testing

python tests/prediction/run.py \
    --runner DataflowRunner \
    --project $PROJECT \
    --staging_location $BUCKET/staging \
    --temp_location $BUCKET/temp \
    --job_name $PROJECT-deepmeerkat \
    --setup_file tests/prediction/setup.py

EDIT

In response to @alex amato

  1. Does the bucket belong to the project or is it owned by another project? Yes, when I go the home screen for the project, this is one of four buckets listed. I commonly upload data and interact with other google cloud services (cloud vision API) from this bucket.

  2. Would you please provide the full error message.

    "(8d8bc4d7fc4a50bd): Failed to write a file to temp location 'gs://api-project-773889352370-testing/temp/api-project-773889352370-deepmeerkat.1498771638.913123'. Please make sure that the bucket for this directory exists, and that the project under which the workflow is running has the necessary permissions to write to it."

    "8d8bc4d7fc4a5f8f): Workflow failed. Causes: (8d8bc4d7fc4a526c): One or more access checks for temp location or staged files failed. Please refer to other error messages for details. For more information on security and permissions, please see https://cloud.google.com/dataflow/security-and-permissions."

  3. Can you confirm that there isn't already an existing GCS object which matches the name of the GCS folder path you are trying to use?

Yes, there is no folder named temp in the bucket.

  1. Could you please verify the permissions you have match the members you run as

Bucket permissions have global admin

enter image description here

which matches my gcloud auth

enter image description here

Upvotes: 5

Views: 9970

Answers (3)

avermeir
avermeir

Reputation: 11

Ran into the same issue with a different cause: I had set object retention policies, which prevents manual deletions. Given that renaming triggers a deletion, this error happened.

Therefore, if anyone runs into a similar issue, investigate your temp bucket's properties and potentially lift any retention policies.

Upvotes: 1

bw4sz
bw4sz

Reputation: 2257

@chamikara was correct. Despite inheriting credentials from my service account, cloud dataflow needs its own credentials.

Can you also give access to cloudservices account (<project-number>@developer.gserviceaccount.com) as mentioned in cloud.google.com/dataflow/security-and-permissions.

Upvotes: 3

Guy P
Guy P

Reputation: 1423

I've got similar errors while moving from DirectRunner to DataflowRunner:

Staged package XXX.jar at location 'gs://YYY/staging/XXX.jar' is inaccessible.

After I've played with the permissions, this is what I did: at Storage Browser, clicked on Edit Bucket Permissions (for the specific bucket), added the right Storage Permission for the member [email protected]

I hope this will save future time for other users as well.

Upvotes: 0

Related Questions