Reputation: 901
I want to use s3fs based on fsspec to access files on S3. Mainly because of 2 neat features:
I don't need this for high frequency use and the files don't change often. It is mainly for using unit/integration test data stored on S3, which changes only if tests and related test data get updated (versions!).
I got both of the above working separately just fine, but it seems I can't get the combination of the two working. That is, I want to be able to cache different versions of the same file locally. It seems that as soon as you use a filecache, the version id disambiguation is lost.
fs = fsspec.filesystem("filecache", target_protocol='s3', cache_storage='/tmp/aws', check_files=True, version_aware=True)
with fs.open("s3://my_bucket/my_file.txt", "r", version_id=version_id) as f:
text = f.read()
No matter what version_id
is, I always get the most recent file from S3, which is also the one that gets cached locally.
What I expect is that I always get the correct file version and the local cache either keeps separate files for each version (preferred) or just updates the local file whenever I request a version different from the cached one.
Is there a way I can achieve this with the current state of the libraries or is this currently not possible? I am using s3fs==fsspec==2022.3.0
.
Upvotes: 3
Views: 907
Reputation: 901
After checking with the developers this combination seems not to be possible with the current state of the libraries, since the hash of the target file is based on the filepath alone, disregarding any other kwargs
such as version_id
.
Upvotes: 2