PSCM
PSCM

Reputation: 85

Problem to iterate through URLs with Selenium

I'm trying to iterate through the pages of a site using Selenium library, but I can only get the home page.

I redid my code to try to fix this problem, however, I only get the following message: InvalidSessionIdException: Message: invalid session id.

The code is below:

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException

driver = webdriver.Chrome(executable_path=r'C:\MYPATH\chromedriver.exe')
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors-spki-list')
options.add_argument('--ignore-ssl-errors')

title_list = []
date_list  = []
genre_list = []

for page_num in range(1, 11):
url = r"https://www.albumoftheyear.org/list/1500-rolling-stones-500-greatest-albums-of-all-time-2020/{}".format(page_num)
driver.get(url)

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "centerContent"))
    )
    albumlistrow = element.find_elements_by_class_name('albumListRow')
    for a in albumlistrow:
        title = a.find_element_by_class_name('albumListTitle')
        date = a.find_element_by_class_name('albumListDate')
        try:
            genre = a.find_element_by_class_name('albumListGenre')
        except NoSuchElementException:
            pass
        title_list.append(title.text)
        date_list.append(date.text)
        genre_list.append(genre.text)

finally:
    driver.close()

df = pd.DataFrame(list(zip(title_list,date_list,genre_list)), columns=['title', 'data','genre'])
df.head()

Upvotes: 0

Views: 606

Answers (1)

bilke
bilke

Reputation: 415

I'm not sure about your error, but since you said you have problem iterating over pages I tried to replicate that and found that they have bot pretection system by Cloudflare.

options.add_argument("--disable-blink-features=AutomationControlled")

This seems to fix the problem.

Tested with code below

options = Options()
options.add_argument("--disable-blink-features=AutomationControlled")
#ChromeDriverManager is for my local machine, you can use your exec_path
d = webdriver.Chrome(ChromeDriverManager().install(),options=options)
d.implicitly_wait(5)

for page_num in range(1,11):
    url = r"https://www.albumoftheyear.org/list/1500-rolling-stones-500-greatest-albums-of-all-time-2020/{}".format(page_num)
    d.get(url)
    sleep(3)

Imports

from selenium.webdriver.chrome.options import Options

Upvotes: 1

Related Questions