David L.
David L.

Reputation: 309

Delete files from Colaboratory without moving to Trash

I would like to immediately delete temporary files saved from a Google Colaboratory notebook without them going to the Trash.

I am using Keras+Tensorflow in my script and have it save the complete model after every epoch of training. The main reason is that if the script is stopped for any reason, I can restart it later and it will read in the most recently saved model and continue training. In order to save disk space (it is using my Google Drive) I have it delete the previous version of the model every time it saves a new one. I did this with the standard python os.remove() only to find out later that I completely filled my Google Drive due to os.remove just moving the files to the Trash folder and not actually deleting them.

I looked around and found references to the google colab API that said you have to call the Delete method of the file object. However, getting a reference to the file object with just a file name seems ridiculously complicated. I assume I am not doing it correctly. The code below is the work-around I came up with. There is a comment that marks where I had to replace my one-liner with 25 lines of much less readable code.

I should also say that the documentation I found kept indicating that I should be able to find the file in basically one call to gdrive.ListFile using something like "name='myfile'" but whenever I tried that, I kept getting http inquiry errors.

!pip install -U -q PyDrive
import os
from google.colab import drive
drive.mount('/content/gdrive')
workdir = '/content/gdrive/My Drive/work/2019.03.26.trackingML/eff100_inverted'
os.chdir( workdir )

epoch = 170
fname = 'model_checkpoints/model_epoch%03d.h5' % (epoch)

#--------------------------------------------------------
# Everything below here is to replace the one line:
# os.remove(fname)

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials


auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
gdrive = GoogleDrive(gauth)

# File google colab file object based on path
fullpath = os.path.join(workdir, fname)
mydirs = fullpath.split('/')[3:]
curid = 'root'
for d in mydirs:
    file_list = gdrive.ListFile({'q': "'%s' in parents and trashed=false" % curid}).GetList()
    for file in file_list:
        if file['title'] == d:
          curid = file['id']
          break

if fname.endswith(file['title']):
  print('Found file %s with id %s' % (file['title'], file['id']))
  file.Delete()
else:
  print('Unable to find %s' % fname)

The above code pretty much does what I want, but seems ugly and bloated. I'm hoping someone can point me to the 1 or 2 line replacement for os.remove() that avoids filling my Trash (and quota).

Upvotes: 4

Views: 4686

Answers (2)

Vítor Manfredini
Vítor Manfredini

Reputation: 75

I solved this problem with:

!echo '' > file-to-delete && rm file-to-delete

It will still move it to the trash but the file will be empty so you won't run out of space. :)

Upvotes: 1

s.abbaasi
s.abbaasi

Reputation: 1114

Suppose that your checkpoint file name is starting with "model_epoch"

1) In colab, write these statements in a cell at beginning:

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

2) Go to Drive an right click on folder which contains checkpoint files and select Get shareable link. An id will be copied.

3) In colab, write this function in a cell. def clearCheckPointFiles():

  file_list = drive.ListFile({'q': "'*******************' in parents and trashed=false"}).GetList()
  for i in range(np.size(file_list)):
    file_name = file_list[i]['title']
    if (file_name[0:11] == 'model_epoch'):
      drive.CreateFile({'id': file_list[i]['id']}).Delete()

4) Replace ***** with the id of copied link in step 2.

5) call clearCheckPointFiles() just before saving new checkpoint.

6) Enjoy!

Upvotes: 2

Related Questions