sbpkoundinya
sbpkoundinya

Reputation: 13

Issue in extracting Titanic training data from Kaggle using Jupyter Notebook

I'm trying to extract Titanic training and test data using Jupyter Notebook. Find below my code snippet.

payload = {
    'action': 'login',
    'username': os.environ.get("KAGGLE_USERNAME"),
    'password': os.environ.get("KAGGLE_PASSWORD")
}

url = "https://www.kaggle.com/c/3136/download/train.csv"

with session() as c:
    c.post('https://www.kaggle.com/account/login', data=payload)
    response = c.get(url)
    print(response.text)

After executing this, I'm getting a HTML response instead of training data. I configured my Kaggle login credentials in .env file properly as well. Did I do something wrong here?

Upvotes: 1

Views: 1631

Answers (1)

h0r53
h0r53

Reputation: 3229

The site you are interested in uses AntiForgeryTokens to prevent things like cross-origin-request-forgery. Your login was not successful, which is why your script was not working. The AF Tokens present an obstacle, but nothing we cannot overcome with the magic of Python. I made an account and I'm successfully pulling down the CSV data you desire with the following script. Note: I had to parse the AntiForgeryToken and my code to do so is a bit messy, but it works.

import requests

payload = {
    '__RequestVerificationToken': '',
    'username': 'OMITTED',
    'password': 'OMITTED',
    'rememberme': 'false'
}

loginURL = 'https://www.kaggle.com/account/login'
dataURL = "https://www.kaggle.com/c/3136/download/train.csv"

with requests.Session() as c:
    response = c.get(loginURL).text
    AFToken = response[response.index('antiForgeryToken')+19:response.index('isAnonymous: ')-12]
    print("AntiForgeryToken={}".format(AFToken))
    payload['__RequestVerificationToken']=AFToken
    c.post(loginURL + "?isModal=true&returnUrl=/", data=payload)
    response = c.get(dataURL)
    print(response.text)

Upvotes: 3

Related Questions