hdiz
hdiz

Reputation: 1161

Using Kaggle Datasets in Google Colab

Is it possible to use any datasets available via the kaggle API in Google Colab? I see the Kaggle API is used in this Colab notebook, but it's a bit unclear to me what datasets it provides access to.

Upvotes: 61

Views: 53344

Answers (13)

Avocano
Avocano

Reputation: 379

After the steps (1-6) above from Bob Smith's answer, to use dataset from a particular competition in colab, you can use the command:

!kaggle competitions download -c elo-merchant-category-recommendation

Here, elo-merchant-category-recommendation is the name of the competition.

Upvotes: 1

aravinda_gn
aravinda_gn

Reputation: 1360

A quick guide to use Kaggle datasets inside Google Colab using Kaggle API

(1) Download the Kaggle API token.

  • Go to “Account”, go down the page, and find the “API” section.
  • Click the “Create New API Token” button.
  • The “kaggle.json” file will be downloaded.

(2) Mount the Google drive to the Colab notebook.

  • It means giving access to the files in your google drive to Colab notebook.
from google.colab import drive
drive.mount("/content/gdrive", force_remount=True)

(3) Upload the “kaggle.json” file into the folder in google drive where you want to download the Kaggle dataset.

(4) Install Kaggle API.

!pip install kaggle

(5) Change the current working directory to where you want to download the Kaggle dataset.

%cd /content/gdrive/MyDrive/DataSets/house_price_data/

(6) Run the following code to configure the path to “kaggle.json”.

import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content/gdrive/MyDrive/DataSets/house_price_data/"

(7) Download the dataset.

!kaggle competitions download -c house-prices-advanced-regression-techniques

Upvotes: 1

Farhan Hai Khan
Farhan Hai Khan

Reputation: 808


import os
os.makedirs("/content/.kaggle/")

import json
token = {"username":"your_username_here","key":"your_kaggle_key_here"}
with open('/content/.kaggle/kaggle.json', 'a+') as file:
    json.dump(token, file)

import shutil
os.makedirs("/.kaggle/")
src="/content/.kaggle/kaggle.json"
des="/.kaggle/kaggle.json"
shutil.copy(src,des)


os.makedirs("/root/.kaggle/")
!cp /content/.kaggle/kaggle.json ~/.kaggle/kaggle.json

!kaggle config set -n path -v /content

#https://towardsdatascience.com/setting-up-kaggle-in-google-colab-ebb281b61463

!kaggle datasets download -d xhlulu/siim-covid19-resized-to-512px-png

Works for me on Colab as of 29-05-21!

Upvotes: 0

flipcc
flipcc

Reputation: 105

I find the accepted answer to be very comprehensive, but would like to add that:

!kaggle competitions download -c dogs-vs-cats

or most other downloads still wont work. You will probably get the following error:

403 - Forbidden

which is not very verbose. It wants to say: "Please visit kaggle.com and accept the rules (e.g. for that competition). You cannot accept through the API! It is explicitly stated in the docs (see Public API documentation | Kaggle):

Just like participating in a Competition normally through the user interface, you must read and accept the rules in order to download data or make submissions. You cannot accept Competition rules via the API. You must do this by visiting the Kaggle website and accepting the rules there.

Yes, this could have been a comment, but I am missing enough reputation to comment.

Upvotes: 1

Emre
Emre

Reputation: 1103

A hacky way:

  1. Go to the dataset page after login
  2. Open Chrome Developer Tools, then go to Network pane
  3. Click Download button on Kaggle
  4. When clicked you will see many requests in Network pane, find the request starting archive.zip
  5. Right click on that request, then Copy -> Copy as cURL (bash). Now you copied the command
  6. On Colab, paste the command and append an ! to the beginnning of the command then run it

This is definitely a less reliable way than the API, but still remains as an option.

Upvotes: 0

MarcusRB
MarcusRB

Reputation: 68

Most important part is before to download files:

In the Kaggle webpage, in the Competition section you must clicked on:

Late Submission or on Join Competition

and

ACCEPT RULE AND CONDITIONS ON KAGGLE COMPETITION WEBPAGE

if not, after copying api file, and after launched downloading the dataset, 403 error shows as result.

Upvotes: 0

Noah Sheldon
Noah Sheldon

Reputation: 1652

Detailed approach:

  1. Go to my account in your profile

enter image description here

  1. Scroll down, until you find an option Create new Api Token, this will download a file called kaggle.json

enter image description here

  1. Go to Colab upload the file kaggle.json

enter image description here

  1. pip install kaggle

enter image description here

  1. create a new folder named kaggle, copy kaggle.json into the kaggle folder, and set read-write permissions only for you(user).

enter image description here

6.Go to Kaggle website.For example, you want to download any data, click on the three dots in the right hand side of the screen. Then click copy API command

enter image description here

  1. Go to colab, paste the API command

enter image description here

8.When you do an !ls, you will see that our download is a zip file.

enter image description here

  1. To unzip the file use the following command

enter image description here

  1. Now, when you do !ls you'll find our csv file is extracted from the zip file.

enter image description here

  1. To read the file perform a simple pd.read_csv, import pandas

enter image description here

12.As you see, we have successfully read our file into colab.

enter image description here

This downloads the kaggle dataset into google colab, where you can perform analysis and build amazing machine learning models or train neural networks.

Happy Analysis!!!

Upvotes: 7

Priyansh gupta
Priyansh gupta

Reputation: 916

To download the competitve data on google colab from kaggle. I'm working on google colab and I've been through the same problem. but i did two tings .

First you have to register your mobile number along with your country code. Second you have to click on last submission on the kaggle dataset page Then download kaggle.json file from kaggle.upload kaggle.json on the google colab After that on google colab run these code is given below.

!pip install -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/ 
!chmod 600 ~/.kaggle/kaggle.json 
!kaggle competitions download -c web-traffic-time-series-forecasting

Upvotes: 1

CypherX
CypherX

Reputation: 7353

Combined the top response to this Github gist as Colab Implementation. You can directly copy the code and use it.

How to Import a Dataset from Kaggle in Colab

Method:

First a few things you have to do:

  1. Sign up for Kaggle
  2. Sign up for a competition you want to access data from (for example LANL-Earthquake-Prediction competition).
  3. Download your credentials to access Kaggle API as kaggle.json
# Install kaggle packages
!pip install -q kaggle
!pip install -q kaggle-cli
# Colab's file access feature
from google.colab import files

# Upload `kaggle.json` file
uploaded = files.upload()
# Retrieve uploaded file
# print results
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

# Then copy kaggle.json into the folder where the API expects to find it.
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!ls ~/.kaggle

Now check if it worked!

#list competitions
!kaggle competitions list -s LANL-Earthquake-Prediction

Upvotes: 3

Seunghun Sunmoon Lee
Seunghun Sunmoon Lee

Reputation: 469

First of all, run this command to find out where this colab file exists, how it executes. enter image description here !ls -d $PWD/* It will show /content/data /content/gdrive /content/models In other words, your current directory is root/content/. Your working directory(pwd) is /content/. so when you do !ls, it will show data gdrive models. FYI, ! allows you to run linux commands inside colab.

Google Drive keeps cleaning up the /content folder. Therefore, every session you use colab, downloaded data sets, kaggle json file will be gone. That's why it's important to automate the process, so you can focus on writing code, not setting up the environment every time.

Run this in colab code block as an example with your own api key. open kaggle.json file. you will find them out.

# Info on how to get your api key (kaggle.json) here: https://github.com/Kaggle/kaggle-api#api-credentials
!pip install kaggle
{"username":"seunghunsunmoonlee","key":""}
import json
import zipfile
import os
with open('/content/.kaggle/kaggle.json', 'w') as file:
    json.dump(api_token, file)
!chmod 600 /content/.kaggle/kaggle.json
!kaggle config path -p /content
!kaggle competitions download -c dog-breed-identification
os.chdir('/content/competitions/dog-breed-identification')
for file in os.listdir():
    zip_ref = zipfile.ZipFile(file, 'r')
    zip_ref.extractall()
    zip_ref.close()

Then run !ls again. You will see all data you need. Hope it helps!

Upvotes: 1

Bob Smith
Bob Smith

Reputation: 38579

Step-by-step --

  1. Create an API key in Kaggle.

    To do this, go to kaggle.com/ and open your user settings page. settings nav

  2. Next, scroll down to the API access section and click generate to download an API key. api token This will download a file called kaggle.json to your computer. You'll use this file in Colab to access Kaggle datasets and competitions.

  3. Navigate to https://colab.research.google.com/.

  4. Upload your kaggle.json file using the following snippet in a code cell:

    from google.colab import files files.upload()

  5. Install the kaggle API using !pip install -q kaggle

  6. Move the kaggle.json file into ~/.kaggle, which is where the API client expects your token to be located:

    !mkdir -p ~/.kaggle !cp kaggle.json ~/.kaggle/

  7. Now you can access datasets using the client, e.g., !kaggle datasets list.

Here's a complete example notebook of the Colab portion of this process: https://colab.research.google.com/drive/1DofKEdQYaXmDWBzuResXWWvxhLgDeVyl

This example shows uploading the kaggle.json file, the Kaggle API client, and using the Kaggle client to download a dataset.

Upvotes: 137

Prakash Gupta
Prakash Gupta

Reputation: 104

Have a look at this.

It uses official kaggle api behind scene, but automates the process so you dont have to re-download manually every time your VM is taken away. Also, another issue i faced with using Kaggle API directly on Colab was the hassle of transferring Kaggle API token via Google Drive. Above method automates that as well.

Disclaimer: I am one of the creators of Clouderizer.

Upvotes: 2

Rachael Tatman
Rachael Tatman

Reputation: 889

You should be able to access any dataset on Kaggle via the API. In this example, only the datasets for competitions are being listed. You can see that datasets you can access with this command:

kaggle datasets list

You can also search for datasets by adding the -s tag and then the search term you're interested in. So this would give you a list of datasets about dogs:

kaggle datasets list -s dogs

You can find more information on the API and how to use it in the documentation here.

Hope that helps! :)

Upvotes: 21

Related Questions