Vindhya G
Vindhya G

Reputation: 1369

selenium with python web crawler

I want to screen scrape a web site having multiple pages. These pages are loaded dynamically without changing the URL. Hence I'm using selenium to screen scrape it. But I'm getting an exception for this simple program.

import re
from contextlib import closing
from selenium.webdriver import Firefox 

url="http://www.samsung.com/in/consumer/mobile-phone/mobile-phone/smartphone/"

with closing(Firefox()) as browser:
    n = 2
    link = browser.find_element_by_link_text(str(n))
    link.click()
    #web_page=browser.page_source
    #print type(web_page)

Error is as follows

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: u'Unable to locate element: {"method":"link text","selector":"2"}' ; Stacktrace: Method FirefoxDriver.prototype.findElementInternal_ threw an error in file:///tmp/tmpMJeeTr/extensions/[email protected]/components/driver_component.js 

Is it the problem with the url given or with the firefox browser. Would be great help if someone helped me.

Upvotes: 1

Views: 4925

Answers (2)

Cory Walker
Cory Walker

Reputation: 4987

I'm developing a python module which might cover your (or another's) use case:

https://github.com/cmwslw/selenium-crawler

It converts recorded selenium scripts to crawling functions, thus avoiding writing any of the above code. It works great with pages that load content dynamically. I hope someone finds this useful.

Upvotes: 1

RocketDonkey
RocketDonkey

Reputation: 37269

I think your main issue is that the page itself takes a while to load, and you are immediately trying to access that link (which likely hasn't yet rendered, hence the stack trace). One thing you can try is using an Implicit Wait 1 with your browser, which will tell the browser to wait for a certain period of time for elements to appear before timing out. In your case, you could try the following, which would wait for up to 10 seconds while polling the DOM for a particular item (in this case, the link text 2):

browser.implicitly_wait(10)
n = 2
link = browser.find_element_by_link_text(str(n))
link.click()
#web_page=browser.page_source
#print type(web_page)

Upvotes: 1

Related Questions