aste123
aste123

Reputation: 1242

Why does selenium wait for a long time before executing this code?

I'm trying to do infinite scrolling on this page and here is my code:

from selenium import webdriver
import time

profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override","Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:28.0) Gecko/20100101 Firefox/28.0")
driver = webdriver.Firefox(profile)

driver.get("http://www.quora.com/Programming-Languages/followers")
for n in range(0,5): # For testing I have capped this at 5, will handle this properly once things start to work.
    driver.execute_script("window.scrollTo(0,1000000);")
    time.sleep(2)

So when I run this, it waits for a lot of seconds (more than 1 min sometimes) before doing any scrolling and then waits again for the same amount of time before the next scrolling. The code seems to work fine on other pages. Any ideas on how to fix this?

When I try to use Chrome instead of firefox, I get these errors: driver = webdriver.Chrome('/home/asdf/apps/chromedrive/chromedriver') added to the .py file.

Traceback (most recent call last):
  File "ok.py", line 8, in <module>
    driver = webdriver.Chrome('/home/asdf/apps/chromedrive/chromedriver')
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/chrome/webdriver.py", line 65, in __init__
    keep_alive=True)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 73, in __init__
    self.start_session(desired_capabilities, browser_profile)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 121, in start_session
    'desiredCapabilities': desired_capabilities,
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 171, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 379, in _request
    self._conn.request(method, parsed_url.path, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 973, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1007, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 829, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 791, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 772, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 553, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known

Upvotes: 2

Views: 3468

Answers (1)

alecxe
alecxe

Reputation: 474081

Switching to Chrome() helped me to solve the problem:

import time
from selenium import webdriver

followers_per_page = 18

driver = webdriver.Chrome()
driver.get("http://www.quora.com/Programming-Languages/followers")

# get the followers count
element = driver.find_element_by_class_name('count')
followers_count = int(element.text.replace('k', '000').replace('.', ''))
print followers_count

# scroll down the page iteratively with a delay
for _ in xrange(0, followers_count/followers_per_page + 1):
    driver.execute_script("window.scrollTo(0, 0,1000000);")
    time.sleep(2)

FYI, I'm, using a bit different approach: parsing the count of followers and calculating followers per page taking into account the fact that it loads 18 followers at a time.

I've actually worked on a similar quora question before, see:


Well, this was not the first thing coming into my mind. Here's the story.

The problem is that there are pending requests to the http://tch840195.tch.quora.com/up/chan5-8886/updates URL that are taking minutes to complete. This is what makes selenium think the page is not completely loaded. And, things are getting worse - this is a periodical thing that happens every X seconds. Think about it as long pooling.

I've tried multiple things to overcome the problem using Firefox webdriver:

  • set webdriver.load.strategy preference to unstable
  • set network.http.response.timeout, network.http.connection-timeout and network.http.keep-alive.timeout and network.http.request.max-start-delay preferences
  • set page load timeout:

    driver.set_page_load_timeout(3)
    
  • set script timeout:

    driver.set_script_timeout(3)
    
  • call window.stop(); hoping it would stop active requests:

    driver.execute_script('window.stop();')
    
  • updated to most recent Firefox and selenium package versions

One other option that might work is to somehow block the request to that "slow url" either using a proxy server and point firefox to it, or, if it is possible, let Firefox know to blacklist the URL (probably through an extension).

Also see the relevant issue with multiple workarounds inside:

Also see:

Upvotes: 1

Related Questions