Unable to unzip very big files from/to google buckets when mounted with gcsfuse

Question

On Google Cloud I have a linux Compute Engine and a bucket. I have mounted the bucket as a drive to the CE using gcsfuse - as recommended by Google - and from time to time I have had a big 7zip archive (tens of GBs) uploaded to the bucket. When I log into the CE's terminal, go to the mounted bucket folder and try to unzip the file (in the same location) using the command: 7z x myarchive.7z it will unzip the file up to 100% (which takes a couple of minutes) and at the end it will fail with:

ERROR: E_FAIL

Archives with Errors: 1

After that if I look at the bucket's content the unzipped file name is present, however it has 0 KB.

I understand that E_FAIL is normally associated with lack of space, but the Google bucket is supposed to have unlimited space (with restrictions to single file sizes). The command df -h for example says that the mounted bucket is supposed to have 1 Petabytes of available storage.

Anyone out there with a similar setup / problem?

norbjd · Accepted Answer

As suggested in the comments, the unzipping process may require some specific operations on the local filesystem, even if you are issuing the command from the mounted directory.

Indeed, since GCS-fuse mounted filesystem is not a classical FS, some operations may require transfers to the local disk (this is the case for random writes for example, see the docs) :

Random writes are done by reading in the whole blob, editing it locally, and writing the whole modified blob back to Cloud Storage. Small writes to large files work as expected, but are slow and expensive.

To ensure that the unzipping process have enough available size to work, and assuming that temporary files are probably created during the process, you should increase the capacity of your local disk.

Unable to unzip very big files from/to google buckets when mounted with gcsfuse

Answers (1)

Related Questions