jjschuh
jjschuh

Reputation: 373

Google Colab Disk space vs Google Drive disk space

I'm using Google Colab to do some machine learning project. I've mounted my drive, activated the GPU, purchased extra storage space from google drive, and have over 100Gb of free space on google drive, but the "drive" monitor in my Colab notebook says that the drive is filling up. Screen shots of my Colab notbook and Google Drive storage below.

enter image description here

enter image description here

Why does the notebook show that the drive is filling up? I've refreshed my google drive several times to make sure that the storage space that's showing is accurate.

Upvotes: 12

Views: 30405

Answers (4)

Charl Botha
Charl Botha

Reputation: 4648

See my NCDU screenshot below, made while using my Colab Pro subscription to try and convert the 150GB Llama-2 70B model to GGML format.

Although I am working on the data directly on the mounted Google Drive (of which I have 2TB, vs the 200GB of free space on the Colab Pro local storage), what you're seeing in NCDU is that all the input files (d5* in the cache, but a bunch of .pth files on the mounted filesystem) AND output file (d10 in the cache, but a single .bin file on the mounted filesystem) are being cached in /root/.config/Google/DriveFS.

They will remain there taking up space, until they are not used by any processes anymore. Once your processing is complete, you'll see the drive process taking up cpu cycles in top until it finishes writing everything to your Google Drive and finally deletes the cache.

This is the reason why you see your disk usage going up even when you're working directly on the Google Drive mount.

I have not yet been able to find a work-around for this regrettable behaviour. In this specific case, being able to disable the cache would have been good, because the conversion process is read-once write-once.

ncdu screenshot showing colab / drive mount cache behaviour

Upvotes: 1

JohnK
JohnK

Reputation: 407

The reason you're seeing your colab drive space fill as you read or write to your mounted Google Drive is because colab will cache contents from Google Drive on its own local storage.

Cached items that aren't in use will be evicted as the colab drive fills, but often the reason you're mounting Google Drive is because you want to have more files in use -- though because of the file caching, you'll pretty much be out of luck there.

Upvotes: 3

Trect
Trect

Reputation: 2945

Google drive storage and google colab disk are different. Google drive storage is object storage system while google colab disk is a File system that you use during the colab runtime(mostly a SSD, or a HDD). Both are different and have different usage (read here for more).

The problem you are facing is the shortage of SSD disk during the runtime. Google colab has no option to increase this. You need to shift to cloud notebooks to increase the size of the runtime SSD.

Upvotes: 0

Ajai
Ajai

Reputation: 1169

The Google Drive storage and Google Colab disk space are different.

Google drive storage is the space given in the google cloud. whereas the colab disk space is the amount of storage in the machine alloted to you at that time. You can increase the storage by changing the runtime.

A machine with GPU has more memory and diskspace than a runtime with cpu only. Similarly if you want more, you can change the runtime to a TPU machine.

Upvotes: 8

Related Questions