UpmostScarab
UpmostScarab

Reputation: 985

Python - Same xpath in selenium and lxml different results

I have this site http://www.google-proxy.net/ and i need to get first proxy's ip:port.

br = webdriver.Firefox()
br.get("http://www.google-proxy.net/")
ip = br.find_element_by_xpath("//tr[@class='odd']/td[1]").text; time.sleep(random.uniform(1, 1))
port = br.find_element_by_xpath("//tr[@class='odd']/td[2]").text; time.sleep(random.uniform(1, 1))

and it works fine. But now i want to do the same with lxml

page = requests.get(proxy_server)
root = lxml.html.fromstring(page.text)
ip = root.xpath("//tr[@class='odd']/td[1]/text()")
port = root.xpath("//tr[@class='odd']/td[1]/text()")

and i get empty lists. Why is that?

Upvotes: 1

Views: 250

Answers (2)

gtlambert
gtlambert

Reputation: 11971

When you use Selenium to open http://www.google-proxy.net, JavaScript is enabled. In this case, JavaScript adds the classes odd and even to the tr elements.

The requests.get method loads the HTML from http://www.google-proxy.net without JavaScript enabled. So the classes odd and even are not added to the tr elements, and your XPath/lxml functionality doesn't select anything. To replicate this behaviour you can use JavaScript switcher plugins eg Chrome plugin. This allows you to easily load webpages without JavaScript enabled.

Upvotes: 1

Radosław Roszkowiak
Radosław Roszkowiak

Reputation: 6881

Looks like 'odd' classes are added by Javascript in this site.

Selenium, as it runs the browser, executes the Javascript, so you have the expected class.

requests library will not execute JS, so there's no 'odd' class.

Upvotes: 2

Related Questions