Michal
Michal

Reputation: 464

Possibility to save uploaded data in Google Colab for reopening

I started recently solving Kaggle competitions, using 2 computers (laptop and PC). Kaggle gives big amout of data for training ML.

The biggest problem for me is downloading that data, it takes about 30 GB, and bigger issue, unzipping it. I was working on my laptop, but I decided to move to PC. I saved the ipynb file and closed laptop.

After opening this file I saw that all unzipped data went missing and I need to spend 2h for downloading and unzipping it once again.

Is it possible to save all unzipped data with this notebook? Or maybe it's stored somewhere on Google Disk?

Upvotes: 2

Views: 785

Answers (1)

natz
natz

Reputation: 731

You can leverage the storage capacity of GoogleDrive. Colab allows you to have this data stored on your Drive and access it from colab notbook as follows:

from google.colab import drive
import matplotlib.image as mpimg 
import matplotlib.pyplot as plt
import pandas as pd

drive.mount('/content/gdrive')
img = mpimg.imread(r'/content/gdrive/My Drive/top.bmp')  # Reading image files
df = pd.read_csv('/content/gdrive/My Drive/myData.csv')  # Loading CSV

When it mounts, it would ask you to visit a particular url to grant permission for accessing drive. Just paste the token returned. Needs to be done only once.

The best thing about colab is you can also run shell cmds from code, all you need to do is to prefix the commands with a ! (bang). Useful when you need to unzip etc.

import os
os.chdir('gdrive/My Drive/data')  #change dir
!ls
!unzip -q iris_data.zip 
df3 = pd.read_csv('/content/gdrive/My Drive/data/iris_data.csv')

Note: Since you have specified that the data is about 30GB, this may not be useful if you are on the free tier provided by Google (as it gives only 15GB per account) you may have to look elsewhere.

You can also visit this particular question for more solutions on Kaggle integration with Google Colab.

Upvotes: 3

Related Questions