Reputation: 975
I am using Jupyter Notebook in Microsoft Azure. Since I cannot upload big files in Azure, I need to read it from a link. The csv file I want to read is in Kaggle.
I did this:
!pip install kaggle
import os
os.environ['KAGGLE_USERNAME'] = "*********"
os.environ['KAGGLE_KEY'] = "*********"
import kaggle
But I don't know how to read the file now.
In other cases I use pandas to read files:
file = pd.read_csv("file/link")
and then I am able to clean and organize my data.
But it is not working in this situation.
Could you please help me?
I want to be able to read and manipulate the data as with the pd.read_csv because I need it for my project in Data Science. This is the dataset I want to be able to work with: https://www.kaggle.com/START-UMD/gtd#globalterrorismdb_0718dist.csv
Upvotes: 3
Views: 3208
Reputation: 93
Kaggle has already provided extensive documentation for their command line API here, which has been built using Python and the source can be found here so reverse engineering it is very straight forward in order to use Kaggle API pythonically.
Assuming you've already exported the username and key as environment variables
import os
os.environ['KAGGLE_USERNAME'] = '<kaggle-user-name>'
os.environ['KAGGLE_KEY'] = '<kaggle-key>'
os.environ['KAGGLE_PROXY'] = '<proxy-address>' ## skip this step if you are not working behind a firewall
or
you've successfully downloaded kaggle.json
from the API section in your Kaggle Account page and copied this JSON to ~/.kaggle/
i.e. the Kaggle configuration directory in your system.
Then, you can use the following code in your Jupyter notebook to load this dataset to a pandas dataframe:
import kaggle as kg
import pandas as pd
kg.api.authenticate()
kg.api.dataset_download_files(dataset="START-UMD/gtd", path='gt.zip', unzip=True)
df = pd.read_csv('gt.zip/globalterrorismdb_0718dist.csv', encoding='ISO-8859-1')
Upvotes: 3