Reputation: 985
I have this site http://www.google-proxy.net/ and i need to get first proxy's ip:port.
br = webdriver.Firefox()
br.get("http://www.google-proxy.net/")
ip = br.find_element_by_xpath("//tr[@class='odd']/td[1]").text; time.sleep(random.uniform(1, 1))
port = br.find_element_by_xpath("//tr[@class='odd']/td[2]").text; time.sleep(random.uniform(1, 1))
and it works fine. But now i want to do the same with lxml
page = requests.get(proxy_server)
root = lxml.html.fromstring(page.text)
ip = root.xpath("//tr[@class='odd']/td[1]/text()")
port = root.xpath("//tr[@class='odd']/td[1]/text()")
and i get empty lists. Why is that?
Upvotes: 1
Views: 250
Reputation: 11971
When you use Selenium
to open http://www.google-proxy.net
, JavaScript is enabled. In this case, JavaScript adds the classes odd
and even
to the tr
elements.
The requests.get
method loads the HTML from http://www.google-proxy.net
without JavaScript enabled. So the classes odd
and even
are not added to the tr
elements, and your XPath/lxml
functionality doesn't select anything. To replicate this behaviour you can use JavaScript switcher plugins eg Chrome plugin. This allows you to easily load webpages without JavaScript enabled.
Upvotes: 1
Reputation: 6881
Looks like 'odd' classes are added by Javascript in this site.
Selenium, as it runs the browser, executes the Javascript, so you have the expected class.
requests library will not execute JS, so there's no 'odd' class.
Upvotes: 2