Reputation: 373
I'm using Google Colab to do some machine learning project. I've mounted my drive, activated the GPU, purchased extra storage space from google drive, and have over 100Gb of free space on google drive, but the "drive" monitor in my Colab notebook says that the drive is filling up. Screen shots of my Colab notbook and Google Drive storage below.
Why does the notebook show that the drive is filling up? I've refreshed my google drive several times to make sure that the storage space that's showing is accurate.
Upvotes: 12
Views: 30405
Reputation: 4648
See my NCDU screenshot below, made while using my Colab Pro subscription to try and convert the 150GB Llama-2 70B model to GGML format.
Although I am working on the data directly on the mounted Google Drive (of which I have 2TB, vs the 200GB of free space on the Colab Pro local storage), what you're seeing in NCDU is that all the input files (d5*
in the cache, but a bunch of .pth
files on the mounted filesystem) AND output file (d10
in the cache, but a single .bin
file on the mounted filesystem) are being cached in /root/.config/Google/DriveFS
.
They will remain there taking up space, until they are not used by any processes anymore. Once your processing is complete, you'll see the drive
process taking up cpu cycles in top
until it finishes writing everything to your Google Drive and finally deletes the cache.
This is the reason why you see your disk usage going up even when you're working directly on the Google Drive mount.
I have not yet been able to find a work-around for this regrettable behaviour. In this specific case, being able to disable the cache would have been good, because the conversion process is read-once write-once.
Upvotes: 1
Reputation: 407
The reason you're seeing your colab drive space fill as you read or write to your mounted Google Drive is because colab will cache contents from Google Drive on its own local storage.
Cached items that aren't in use will be evicted as the colab drive fills, but often the reason you're mounting Google Drive is because you want to have more files in use -- though because of the file caching, you'll pretty much be out of luck there.
Upvotes: 3
Reputation: 2945
Google drive storage and google colab disk are different. Google drive storage is object storage system while google colab disk is a File system that you use during the colab runtime(mostly a SSD, or a HDD). Both are different and have different usage (read here for more).
The problem you are facing is the shortage of SSD disk during the runtime. Google colab has no option to increase this. You need to shift to cloud notebooks to increase the size of the runtime SSD.
Upvotes: 0
Reputation: 1169
The Google Drive storage and Google Colab disk space are different.
Google drive storage is the space given in the google cloud. whereas the colab disk space is the amount of storage in the machine alloted to you at that time. You can increase the storage by changing the runtime.
A machine with GPU has more memory and diskspace than a runtime with cpu only. Similarly if you want more, you can change the runtime to a TPU machine.
Upvotes: 8