Michal
Michal

Reputation: 39

Web scraping script returns and empty list

I am attempting to write my first web scraper for a test website. It involves logging in, I followed a tutorial on how to handle such situations.

import requests
from lxml import html



payload = {
"email": "[email protected]",
"password": "123qweasd",
"_token": "3ow4dl7COwnRHa8a6nvNGp4eLkF3wQapT3otGXjR"
 }

rs = requests.session()

login_url = 'https://cloud.webscraper.io/login'
log_page = rs.get(login_url)

tree = html.fromstring(log_page.content)
auth_token = list(set(tree.xpath("//input[@name='_token']/@value")))[0]

login = rs.post(login_url,data=payload, headers=dict(referer=login_url))

url = "https://cloud.webscraper.io/sitemaps"
result = rs.get(url, headers=dict(referer=url))

tree = html.fromstring(result.text)
sidebar_cat = tree.xpath('//*[@id="main-menu-inner"]/ul')

print(sidebar_cat)

I wanted this script to list the categories from the sidebar. It seems that the script returns and empty list each time. Current output is

"[] 
Process finished with exit code 0"

Upvotes: 1

Views: 247

Answers (2)

Andersson
Andersson

Reputation: 52665

You've extracted _token value, but used hardcoded value instead. Try to pass extracted value to payload:

import requests
from lxml import html

rs = requests.session()

login_url = 'https://cloud.webscraper.io/login'
log_page = rs.get(login_url)

tree = html.fromstring(log_page.content)
auth_token = tree.xpath("//input[@name='_token']/@value")[0]

payload = {
    "email": "[email protected]",
    "password": "123qweasd",
    "_token": auth_token
 }

login = rs.post(login_url,data=payload, headers=dict(referer=login_url))

url = "https://cloud.webscraper.io/sitemaps"
result = rs.get(url, headers=dict(referer=url))

tree = html.fromstring(result.text)
sidebar_cat = tree.xpath('//*[@id="main-menu-inner"]/ul')

print(sidebar_cat)

Upvotes: 1

il-dar
il-dar

Reputation: 1

Try printing the 'result' out and temporarily comment out everything after. That way you know if its an error with the request part or if the problem is from the response processing. If the response prints as you expected, then troubleshoot the last three lines of code. If not, then troubleshoot the request part of the code.

Upvotes: 0

Related Questions