Reputation: 309
I would like to immediately delete temporary files saved from a Google Colaboratory notebook without them going to the Trash.
I am using Keras+Tensorflow in my script and have it save the complete model after every epoch of training. The main reason is that if the script is stopped for any reason, I can restart it later and it will read in the most recently saved model and continue training. In order to save disk space (it is using my Google Drive) I have it delete the previous version of the model every time it saves a new one. I did this with the standard python os.remove() only to find out later that I completely filled my Google Drive due to os.remove just moving the files to the Trash folder and not actually deleting them.
I looked around and found references to the google colab API that said you have to call the Delete method of the file object. However, getting a reference to the file object with just a file name seems ridiculously complicated. I assume I am not doing it correctly. The code below is the work-around I came up with. There is a comment that marks where I had to replace my one-liner with 25 lines of much less readable code.
I should also say that the documentation I found kept indicating that I should be able to find the file in basically one call to gdrive.ListFile using something like "name='myfile'" but whenever I tried that, I kept getting http inquiry errors.
!pip install -U -q PyDrive
import os
from google.colab import drive
workdir = '/content/gdrive/My Drive/work/2019.03.26.trackingML/eff100_inverted'
os.chdir( workdir )
epoch = 170
fname = 'model_checkpoints/model_epoch%03d.h5' % (epoch)
# Everything below here is to replace the one line:
# os.remove(fname)
from pydrive.auth import GoogleAuth
from import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
gdrive = GoogleDrive(gauth)
# File google colab file object based on path
fullpath = os.path.join(workdir, fname)
mydirs = fullpath.split('/')[3:]
curid = 'root'
for d in mydirs:
file_list = gdrive.ListFile({'q': "'%s' in parents and trashed=false" % curid}).GetList()
for file in file_list:
if file['title'] == d:
curid = file['id']
if fname.endswith(file['title']):
print('Found file %s with id %s' % (file['title'], file['id']))
print('Unable to find %s' % fname)
The above code pretty much does what I want, but seems ugly and bloated. I'm hoping someone can point me to the 1 or 2 line replacement for os.remove() that avoids filling my Trash (and quota).
Upvotes: 4
Views: 4686
Reputation: 75
I solved this problem with:
!echo '' > file-to-delete && rm file-to-delete
It will still move it to the trash but the file will be empty so you won't run out of space. :)
Upvotes: 1
Reputation: 1114
Suppose that your checkpoint file name is starting with "model_epoch
1) In colab, write these statements in a cell at beginning:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
2) Go to Drive an right click on folder which contains checkpoint files and select Get shareable link
. An id will be copied.
3) In colab, write this function in a cell. def clearCheckPointFiles():
file_list = drive.ListFile({'q': "'*******************' in parents and trashed=false"}).GetList()
for i in range(np.size(file_list)):
file_name = file_list[i]['title']
if (file_name[0:11] == 'model_epoch'):
drive.CreateFile({'id': file_list[i]['id']}).Delete()
4) Replace ***** with the id
of copied link in step 2.
5) call clearCheckPointFiles()
just before saving new checkpoint.
6) Enjoy!
Upvotes: 2