Reputation: 1849
I have a cloud function configured to be triggered on google.storage.object.finalize
in a storage bucket. This was running well for a while. However recently I start to getting some errors FileNotFoundError
when trying to read the file. But if I try download the file through the gsutil or the console works fine.
Code sample:
def main(data, context):
full_filename = data['name']
bucket = data['bucket']
df = pd.read_csv(f'gs://{bucket}/{full_filename}') # intermittent raises FileNotFoundError
The errors occurs most often when the file was overwritten. The bucket has the object versioning enabled.
There are something I can do?
Upvotes: 0
Views: 1562
Reputation: 4660
As clarified in this other similar case here, sometimes cache can be an issue between Cloud Functions and Cloud Storage, where this can be causing the files to get overwritten and this way, not possible to be found, causing the FileNotFoundError
to show up.
Using the invalidate_cache
before reading the file can help in this situations, since it will disconsider the cache for the reading and avoid the error. The code for using invalidate_cache
is like this:
import gcsfs
fs = gcsfs.GCSFileSystem()
fs.invalidate_cache()
Upvotes: 3
Reputation: 31
Check in function logging if your function execution is not triggered twice on single object finalize:
'size': '0'
size
with actual object sizeIf your function fails on the first you can simply filter it out by checking the attribute value and continuing only if non-zero.
def main(data, context):
object_size = data['size']
if object_size != '0':
full_filename = data['name']
bucket = data['bucket']
df = pd.read_csv(f'gs://{bucket}/{full_filename}')
Don't know what exactly is causing the double-triggering but had similar problem once when using Cloud Storage FUSE and this was a quick solution solving the problem.
Upvotes: 0