Reputation: 13
I'm trying to extract Titanic training and test data using Jupyter Notebook. Find below my code snippet.
payload = {
'action': 'login',
'username': os.environ.get("KAGGLE_USERNAME"),
'password': os.environ.get("KAGGLE_PASSWORD")
}
url = "https://www.kaggle.com/c/3136/download/train.csv"
with session() as c:
c.post('https://www.kaggle.com/account/login', data=payload)
response = c.get(url)
print(response.text)
After executing this, I'm getting a HTML response instead of training data. I configured my Kaggle login credentials in .env file properly as well. Did I do something wrong here?
Upvotes: 1
Views: 1631
Reputation: 3229
The site you are interested in uses AntiForgeryTokens to prevent things like cross-origin-request-forgery. Your login was not successful, which is why your script was not working. The AF Tokens present an obstacle, but nothing we cannot overcome with the magic of Python. I made an account and I'm successfully pulling down the CSV data you desire with the following script. Note: I had to parse the AntiForgeryToken and my code to do so is a bit messy, but it works.
import requests
payload = {
'__RequestVerificationToken': '',
'username': 'OMITTED',
'password': 'OMITTED',
'rememberme': 'false'
}
loginURL = 'https://www.kaggle.com/account/login'
dataURL = "https://www.kaggle.com/c/3136/download/train.csv"
with requests.Session() as c:
response = c.get(loginURL).text
AFToken = response[response.index('antiForgeryToken')+19:response.index('isAnonymous: ')-12]
print("AntiForgeryToken={}".format(AFToken))
payload['__RequestVerificationToken']=AFToken
c.post(loginURL + "?isModal=true&returnUrl=/", data=payload)
response = c.get(dataURL)
print(response.text)
Upvotes: 3