Yana
Yana

Reputation: 975

How to read file from Kaggle in Jupyter Notebook in Microsoft Azure?

I am using Jupyter Notebook in Microsoft Azure. Since I cannot upload big files in Azure, I need to read it from a link. The csv file I want to read is in Kaggle.

I did this:

!pip install kaggle

import os

os.environ['KAGGLE_USERNAME'] = "*********"

os.environ['KAGGLE_KEY'] = "*********"

import kaggle

But I don't know how to read the file now. In other cases I use pandas to read files: file = pd.read_csv("file/link") and then I am able to clean and organize my data. But it is not working in this situation. Could you please help me?

I want to be able to read and manipulate the data as with the pd.read_csv because I need it for my project in Data Science. This is the dataset I want to be able to work with: https://www.kaggle.com/START-UMD/gtd#globalterrorismdb_0718dist.csv

Upvotes: 3

Views: 3208

Answers (1)

Ankush Chauhan
Ankush Chauhan

Reputation: 93

Kaggle has already provided extensive documentation for their command line API here, which has been built using Python and the source can be found here so reverse engineering it is very straight forward in order to use Kaggle API pythonically.

Assuming you've already exported the username and key as environment variables

import os
os.environ['KAGGLE_USERNAME'] = '<kaggle-user-name>'
os.environ['KAGGLE_KEY'] = '<kaggle-key>'
os.environ['KAGGLE_PROXY'] = '<proxy-address>' ## skip this step if you are not working behind a firewall

or you've successfully downloaded kaggle.json from the API section in your Kaggle Account page and copied this JSON to ~/.kaggle/ i.e. the Kaggle configuration directory in your system.

Then, you can use the following code in your Jupyter notebook to load this dataset to a pandas dataframe:

  1. Import libraries
import kaggle as kg
import pandas as pd

  1. Download the dataset locally
kg.api.authenticate()
kg.api.dataset_download_files(dataset="START-UMD/gtd", path='gt.zip', unzip=True)
  1. Read the downloaded dataset
df = pd.read_csv('gt.zip/globalterrorismdb_0718dist.csv', encoding='ISO-8859-1')

Upvotes: 3

Related Questions