name
name

Reputation: 53

WebException timeout Selenium Python, works first time through loop, then timesout

I have a client that wants to web scrape this sketchy website and the loops works the first time, then the error occurs. Any help? I suggest not visiting the website, but hopefully the pays worth my time lol.

options = webdriver.ChromeOptions()
options.add_argument("--incognito")
PATH = 'C:\Program Files (x86)\chromedriver.exe'
URL = 'https://avbebe.com/archives/category/高清中字/page/5'
driver = webdriver.Chrome(executable_path=PATH, options=options)
driver.get(URL)

time.sleep(5)
Vid = driver.find_elements_by_class_name('entry-title')
for title in Vid:
    actions = ActionChains(driver)
    time.sleep(5)
    WebDriverWait(title, 10).until(EC.element_to_be_clickable((By.TAG_NAME, 'a')))#where error occurs
    actions.double_click(title).perform()
    time.sleep(5)
    VidUrl = driver.current_url
    VidTitle = driver.find_element_by_xpath('//*[@id="post-69331"]/h1/a').text
    try:
        VidTags = driver.find_elements_by_class_name('tags')
        for tag in VidTags:
            VidTag = tag.find_element_by_tag_name('a').text
        
    except NoSuchElementException or StaleElementReferenceException:
        pass
    
    with open('data.csv', 'w', newline='', encoding = "utf-8") as f:
        fieldnames = ['Title', 'Tags', 'URL']
        thewriter = csv.DictWriter(f, fieldnames=fieldnames)

        thewriter.writeheader()
        thewriter.writerow({'Title': VidTitle, 'Tags': VidTag, 'URL': VidUrl})
    driver.back()
    driver.refresh()
print('done')        

Error:

WebDriverWait(title, 10).until(EC.element_to_be_clickable((By.TAG_NAME, 'a')))
  File "C:\Users\Heage\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\support\wait.py",

line 80, in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message:

Upvotes: 1

Views: 266

Answers (2)

Zyy
Zyy

Reputation: 854

You are nearly there, just missing a few pieces.

Firstly, you are fetching all the links to videos, and then navigating in a loop.

Vid = driver.find_elements_by_class_name('entry-title')
for title in Vid:
    # ...
    WebDriverWait(title, 10).until(EC.element_to_be_clickable((By.TAG_NAME, 'a')))
    # ...
    driver.back()
    driver.refresh()

What happens is that once the browser navigates to a different url, all of those elements become stale, i.e. they will throw an error when you try to click them as the browser no longer has a connection to the original elements.

So what you need to do is to read all the available links into a list and just access them using driver.get without the need to refresh the page

link_elements = driver.find_elements_by_class_name('entry-title a')
links = {link_element.get_attribute('href') for link_element in link_elements}

for link in links:
    driver.get(link) # otherwise, stale elements

Next, once you open the page, you are searching for an element with an id.

    VidTitle = driver.find_element_by_xpath('//*[@id="post-69331"]/h1/a').text

However, you have to keep in mind that ids change from page to page, so your script is likely to fail here. Instead, try to find classes that don't change. I took a look at the page and found that the video title has an tag with a entry-title class, so I used that instead

    VidTitle = driver.find_element_by_css_selector('h1.entry-title').text

Working solution


options = Options()
options.add_argument("--incognito")
driver = webdriver.Chrome(options=options)

URL = 'https://avbebe.com/archives/category/高清中字/page/5'

driver.get(URL)

link_elements = driver.find_elements_by_class_name('entry-title a')
links = {link_element.get_attribute('href') for link_element in link_elements}

for link in links:
    driver.get(link)

    VidUrl = driver.current_url
    VidTitle = driver.find_element_by_css_selector('h1.entry-title').text
    try:
        VidTags = driver.find_elements_by_class_name('tags')
        for tag in VidTags:
            VidTag = tag.find_element_by_tag_name('a').text

    except NoSuchElementException or StaleElementReferenceException:
        pass

    with open('data.csv', 'w', newline='', encoding="utf-8") as f:
        fieldnames = ['Title', 'Tags', 'URL']
        thewriter = csv.DictWriter(f, fieldnames=fieldnames)

        thewriter.writeheader()
        thewriter.writerow({'Title': VidTitle, 'Tags': VidTag, 'URL': VidUrl})

print('done')

Upvotes: 3

Jortega
Jortega

Reputation: 3790

Put the line driver.get(URL) inside the loop. Remove driver.back() and driver.refresh().

options = webdriver.ChromeOptions()
options.add_argument("--incognito")
PATH = 'C:\Program Files (x86)\chromedriver.exe'
URL = 'https://avbebe.com/archives/category/高清中字/page/5'
driver = webdriver.Chrome(executable_path=PATH, options=options)
driver.get(URL)

time.sleep(5)
Vid = driver.find_elements_by_class_name('entry-title')
for title in Vid:
    driver.get(URL)
    actions = ActionChains(driver)
    time.sleep(5)
    WebDriverWait(title, 10).until(EC.element_to_be_clickable((By.TAG_NAME, 'a')))#where error occurs
    actions.double_click(title).perform()
    time.sleep(5)
    VidUrl = driver.current_url
    VidTitle = driver.find_element_by_xpath('//*[@id="post-69331"]/h1/a').text
    try:
        VidTags = driver.find_elements_by_class_name('tags')
        for tag in VidTags:
            VidTag = tag.find_element_by_tag_name('a').text
        
    except NoSuchElementException or StaleElementReferenceException:
        pass
    
    with open('data.csv', 'w', newline='', encoding = "utf-8") as f:
        fieldnames = ['Title', 'Tags', 'URL']
        thewriter = csv.DictWriter(f, fieldnames=fieldnames)

        thewriter.writeheader()
        thewriter.writerow({'Title': VidTitle, 'Tags': VidTag, 'URL': VidUrl})
    #driver.back()
    #driver.refresh()
print('done')

Upvotes: 0

Related Questions