Reputation: 39
I am attempting to write my first web scraper for a test website. It involves logging in, I followed a tutorial on how to handle such situations.
import requests
from lxml import html
payload = {
"email": "[email protected]",
"password": "123qweasd",
"_token": "3ow4dl7COwnRHa8a6nvNGp4eLkF3wQapT3otGXjR"
}
rs = requests.session()
login_url = 'https://cloud.webscraper.io/login'
log_page = rs.get(login_url)
tree = html.fromstring(log_page.content)
auth_token = list(set(tree.xpath("//input[@name='_token']/@value")))[0]
login = rs.post(login_url,data=payload, headers=dict(referer=login_url))
url = "https://cloud.webscraper.io/sitemaps"
result = rs.get(url, headers=dict(referer=url))
tree = html.fromstring(result.text)
sidebar_cat = tree.xpath('//*[@id="main-menu-inner"]/ul')
print(sidebar_cat)
I wanted this script to list the categories from the sidebar. It seems that the script returns and empty list each time. Current output is
"[]
Process finished with exit code 0"
Upvotes: 1
Views: 247
Reputation: 52665
You've extracted _token
value, but used hardcoded value instead. Try to pass extracted value to payload
:
import requests
from lxml import html
rs = requests.session()
login_url = 'https://cloud.webscraper.io/login'
log_page = rs.get(login_url)
tree = html.fromstring(log_page.content)
auth_token = tree.xpath("//input[@name='_token']/@value")[0]
payload = {
"email": "[email protected]",
"password": "123qweasd",
"_token": auth_token
}
login = rs.post(login_url,data=payload, headers=dict(referer=login_url))
url = "https://cloud.webscraper.io/sitemaps"
result = rs.get(url, headers=dict(referer=url))
tree = html.fromstring(result.text)
sidebar_cat = tree.xpath('//*[@id="main-menu-inner"]/ul')
print(sidebar_cat)
Upvotes: 1
Reputation: 1
Try printing the 'result' out and temporarily comment out everything after. That way you know if its an error with the request part or if the problem is from the response processing. If the response prints as you expected, then troubleshoot the last three lines of code. If not, then troubleshoot the request part of the code.
Upvotes: 0