Damião Martins
Damião Martins

Reputation: 1849

Cloud function triggered by object created storage getting file not found error

I have a cloud function configured to be triggered on google.storage.object.finalize in a storage bucket. This was running well for a while. However recently I start to getting some errors FileNotFoundError when trying to read the file. But if I try download the file through the gsutil or the console works fine.

Code sample:

def main(data, context):
    full_filename = data['name']
    bucket = data['bucket']
    df = pd.read_csv(f'gs://{bucket}/{full_filename}') # intermittent raises FileNotFoundError 

The errors occurs most often when the file was overwritten. The bucket has the object versioning enabled.

There are something I can do?

Upvotes: 0

Views: 1562

Answers (2)

gso_gabriel
gso_gabriel

Reputation: 4660

As clarified in this other similar case here, sometimes cache can be an issue between Cloud Functions and Cloud Storage, where this can be causing the files to get overwritten and this way, not possible to be found, causing the FileNotFoundError to show up.

Using the invalidate_cache before reading the file can help in this situations, since it will disconsider the cache for the reading and avoid the error. The code for using invalidate_cache is like this:

import gcsfs

fs = gcsfs.GCSFileSystem() 
fs.invalidate_cache()

Upvotes: 3

Zwirek009
Zwirek009

Reputation: 31

Check in function logging if your function execution is not triggered twice on single object finalize:

  • first triggered execution with event attribute 'size': '0'
  • second triggered execution with event attribute size with actual object size

If your function fails on the first you can simply filter it out by checking the attribute value and continuing only if non-zero.

def main(data, context):
    object_size = data['size']
    if object_size != '0':
        full_filename = data['name']
        bucket = data['bucket']
        df = pd.read_csv(f'gs://{bucket}/{full_filename}') 

Don't know what exactly is causing the double-triggering but had similar problem once when using Cloud Storage FUSE and this was a quick solution solving the problem.

Upvotes: 0

Related Questions