Reputation: 23
I have a hard time browsing through the 448 consecutive pages of the following page with Selenium under Python in a robust manner. I tried (too) many things without satisfactory result (hence, difficult to put relevant code).
Would like to see your solution. Apologize if the question is not appropriately formulated: first timer.
from selenium import webdriver
from import By
from selenium.webdriver.common.keys import Keys
from import WebDriverWait
from import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
WebDriverWait(browser, 1000).until(EC.element_to_be_clickable((By.CLASS_NAME,'next'))).click()
input('Press ENTER to close the automated browser')
I get the following error: selenium.common.exceptions.ElementNotInteractableException: Message: Element could not be scrolled into view
Upvotes: 2
Views: 246
Reputation: 10460
Every time you click to go to next page ('Suivant' button), the javascript in page is making a POST request to an API endpoint, with a header and a payload. Header, payload and API endpoint can be found in browser Dev tools - Network tab (select only XHR calls). Hence, we can try and scrape that API url using requests and avoiding the overheads of selenium/chromedriver. Below is a way of obtaining that data:
import requests
import pandas as pd
big_df = pd.DataFrame()
url = ''
headers = {
'content-type': 'application/json',
'Origin': '',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
s = requests.Session()
counter = 0
while True:
payload = '{"id":"filter-profile-search-template-fr-v3","params":{"categoriesSlugList":[],"programsSlugList":[],"from":' + str(counter) + ',"regionsList":[],"size":100}}'
r =, data=payload)
big_df = pd.concat([big_df, pd.json_normalize(r.json()['hits']['hits'])], axis=0, ignore_index=True)
counter = counter + 100
if counter > 448*12:
We are getting 100 items at once (the actual page is getting 12 at once). After a minute or so, you should have the following dataframe displayed in your terminal:
_index _type _id _score sort _source.sys.updatedAt _source.fields.shortDescription.en _source.fields.title.en _source.fields.slug.en _source.fields.shortTitle.en
0 contentful-entries_productionv3 _doc 3O1t8sTHhj5ZGrmGKtHI6y None [ Dynamix JAVA] 3O1t8sTHhj5ZGrmGKtHI6y profile 2022-09-01T14:36:06.899Z [{'Geometry': {'Viewport': {'Southwest': {'Lng': 4.388591169708497, 'Lat': 50.7035958197085}, 'Northeast': {'Lng': 4.391289130291502, 'Lat': 50.7062937802915}}, 'coordinates': [4.3898572, 50.7050388], 'type': 'Point', 'Location': {'Lng': 4.3898572, 'Lat': 50.7050388}}, 'Metadata': {'PlaceId': 'ChIJOZeR297Rw0cR_y-bZPZvwzQ', 'AddressType': 'head office', 'Timestamp': '2022-08-29T13:55:32.180Z'}, 'FormattedAddress': 'Av. des Dauphins 17, 1410 Waterloo, Belgique', 'MainAddress': True}] [0715677777] [{'Metadata': {'Timestamp': '2022-08-29T15:58:45+02:00'}, 'URL': ''}] Consulting company specialised in JAVA, SAP, DotNet, and son one. Société de consultance spécialisée en JAVA, SAP, DotNet, etc. dynamix_java.png 160.0 160.0 15950.0 image/png // dynamix java.png 160.0 160.0 15950.0 image/png // Dynamix Java Dynamix Java Dynamix JAVA Dynamix JAVA dynamix-java dynamix-java [{'Metadata': {'Timestamp': '2022-08-29T15:58:14+02:00'}, 'URL': ''}, {'Metadata': {'Timestamp': '2022-08-29T15:58:27+02:00'}, 'URL': ''}] NaN NaN NaN NaN NaN NaN NaN NaN
1 contentful-entries_productionv3 _doc 4D2kOg0t4iRD11fzJFaPc8 None [ Lan-Area ] 4D2kOg0t4iRD11fzJFaPc8 profile 2022-08-25T08:42:32.473Z [{'Geometry': {'Viewport': {'Southwest': {'Lng': 4.744188919708497, 'Lat': 50.3149442697085}, 'Northeast': {'Lng': 4.746886880291502, 'Lat': 50.3176422302915}}, 'coordinates': [4.745529299999999, 50.31632769999999], 'type': 'Point', 'Location': {'Lng': 4.745529299999999, 'Lat': 50.31632769999999}}, 'Metadata': {'PlaceId': 'ChIJm9XAKz6SwUcRs45ovYpmEpc', 'AddressType': 'head office', 'Timestamp': '2022-06-21T14:17:33.655Z'}, 'FormattedAddress': 'Rue d'Ermeton 14, 5537 Anhée, Belgique', 'MainAddress': True}] [0779822986] [{'Metadata': {'Timestamp': '2022-08-25T10:42:29+02:00'}, 'URL': ''}] Platform exclusively focused on local sports competition. Lan-Area has created a central calendar where all local events are announced and a Belgian community space where players can post their teams, courses and successes. Plateforme exclusivement tournée vers la compétition e-sportive locale . Lan-Area a créé un calendrier central où tous les évènements locaux sont annoncés et un espace communautaire belge où les joueurs peuvent afficher leurs équipes, parcours et succès. lan-Aera.jpg 450.0 250.0 21154.0 image/jpeg // lan-Aera.jpg 450.0 250.0 21154.0 image/jpeg // lan-Aera Logo Lan-Aera Lan-Aera Lan-Area lan-aera lan-area [{'Metadata': {'Timestamp': '2022-06-21T15:06:34+02:00'}, 'URL': ''}, {'Metadata': {'Timestamp': '2022-06-21T15:07:31+02:00'}, 'URL': ''}, {'Metadata': {'Timestamp': '2022-06-21T15:59:53+02:00'}, 'URL': ''}] NaN NaN NaN NaN NaN NaN NaN NaN
2 contentful-entries_productionv3 _doc 6sbdRDRWJXTTtbR1wycE52 None [] 6sbdRDRWJXTTtbR1wycE52 profile 2022-05-15T11:21:20.388Z [{'Geometry': {'Viewport': {'Southwest': {'Lng': 4.863200770107277, 'Lat': 50.46117977010727}, 'Northeast': {'Lng': 4.865900429892721, 'Lat': 50.46387942989271}}, 'coordinates': [4.864224099999999, 50.462539], 'type': 'Point', 'Location': {'Lng': 4.864224099999999, 'Lat': 50.462539}}, 'Metadata': {'PlaceId': 'ChIJa-SkInKZwUcRsc1Xs-GqwSE', 'AddressType': 'head office', 'Timestamp': '2022-05-07T18:43:01.598Z'}, 'FormattedAddress': 'Rue des Fossés Fleuris 42, 5000 Namur, Belgique', 'MainAddress': True}] [0891973792] [{'Metadata': {'Timestamp': '2022-05-07T18:43:01.459Z'}, 'URL': ''}] Training in IT following based on four subjects: office applications, web and image, web marketing and communication, personnel management and development. Formations en informatique suivant quatre thématiques: bureautique, web et image, webmarketing et communication, management et développement personnel. NaN NaN NaN NaN NaN NaN logo-f-1-formation.jpg 350.0 77.0 5569.0 image/jpeg // NaN logo-f-1-formation.jpg 1-formationbe 1-formationbe [{'Metadata': {'Timestamp': '2022-05-07T18:43:01.459Z'}, 'URL': ''}, {'Metadata': {'Timestamp': '2022-05-07T18:43:01.459Z'}, 'URL': ''}] NaN NaN NaN NaN NaN NaN NaN NaN
3 contentful-entries_productionv3 _doc 4EuOqP1eQIeka5xHcoq5mQ None [] 4EuOqP1eQIeka5xHcoq5mQ profile 2022-05-15T11:21:23.274Z [{'Geometry': {'Viewport': {'Southwest': {'Lng': 4.863200770107277, 'Lat': 50.46117977010727}, 'Northeast': {'Lng': 4.865900429892721, 'Lat': 50.46387942989271}}, 'coordinates': [4.864224099999999, 50.462539], 'type': 'Point', 'Location': {'Lng': 4.864224099999999, 'Lat': 50.462539}}, 'Metadata': {'PlaceId': 'ChIJa-SkInKZwUcRsc1Xs-GqwSE', 'AddressType': 'head office', 'Timestamp': '2022-05-07T18:51:39.745Z'}, 'FormattedAddress': 'Rue des Fossés Fleuris 42, 5000 Namur, Belgique', 'MainAddress': True}] [0891973792] [] Communications agency and IT training centre: website creation, professional SEO, the creation of Google Adwords campaigns, copywriting and web content, visual identity creation, communications consulting. Agence de communication et centre de formation informatique: création de sites web, référencement professionnel, création et gestion de campagnes Google AdWords, copywriting et écriture web, création d'identité visuelle, conseil en communication. NaN NaN NaN NaN NaN NaN marque-cp52u3ifgt9us27gak951f15p6-1369821343-position.png 169.0 129.0 11128.0 image/png // NaN marque-cp52u3ifgt9us27gak951f15p6-1369821343-position.png 1-positionbe 1-positionbe [{'Metadata': {'Timestamp': '2022-05-07T18:51:39.679Z'}, 'URL': ''}, {'Metadata': {'Timestamp': '2022-05-07T18:51:39.679Z'}, 'URL': ''}, {'Metadata': {'Timestamp': '2022-05-07T18:51:39.679Z'}, 'URL': ''}] NaN NaN NaN NaN NaN NaN NaN NaN
4 contentful-entries_productionv3 _doc 1VvYEZncg0lEDL8RzGAvmE None [123 Automation Engineering & Development] 1VvYEZncg0lEDL8RzGAvmE profile 2022-05-15T05:25:51.214Z [{'Geometry': {'Viewport': {'Southwest': {'Lng': 4.456926070107278, 'Lat': 50.53833147010727}, 'Northeast': {'Lng': 4.459625729892722, 'Lat': 50.54103112989272}}, 'coordinates': [4.4582759, 50.5396813], 'type': 'Point', 'Location': {'Lng': 4.4582759, 'Lat': 50.5396813}}, 'Metadata': {'PlaceId': 'EjNSdWUgZGVzIEFydGlzYW5zIDQsIDYyMTAgTGVzIEJvbnMgVmlsbGVycywgQmVsZ2lxdWUiGhIYChQKEgn75Aq3dyzCRxFEh7hEj1NdPBAE', 'AddressType': 'head office', 'Timestamp': '2022-05-07T15:17:32.918Z'}, 'FormattedAddress': 'Rue des Artisans 4, 6210 Les Bons Villers, Belgique', 'MainAddress': True}] [0820888531] [{'Metadata': {'Timestamp': '2022-05-07T15:17:32.867Z'}, 'URL': ''}] NaN Automation et robotique industrielle: étude, conception, développement, intégration et maintenance de solutions automatisées visant l’amélioration de la productivité dans les processus de fabrication quels qu’ils soient. NaN NaN NaN NaN NaN NaN 123automation.png 319.0 111.0 5802.0 image/png // NaN 123automation.png 123 Automation Engineering & Development 123 Automation Engineering & Development 123-automation 123-automation [] NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5360 contentful-entries_productionv3 _doc 1AbDfyZ4rHL18Bw6aiJKSA None [École Centrale des Arts et Métiers - HE Vinci] 1AbDfyZ4rHL18Bw6aiJKSA profile 2022-05-15T11:43:23.005Z [{'Geometry': {'Viewport': {'Southwest': {'Lng': 4.452325870107279, 'Lat': 50.84853592010727}, 'Northeast': {'Lng': 4.455025529892723, 'Lat': 50.85123557989272}}, 'coordinates': [4.4538028, 50.8499896], 'type': 'Point', 'Location': {'Lng': 4.4538028, 'Lat': 50.8499896}}, 'Metadata': {'PlaceId': 'ChIJwdgtpYbcw0cRfjW1nUhDNk8', 'AddressType': 'head office', 'Timestamp': '2022-05-07T15:44:19.720Z'}, 'FormattedAddress': 'Prom. de l'Alma 50, 1200 Woluwe-Saint-Lambert, Belgique', 'MainAddress': True}] [0459279954, 0409454123] [{'Metadata': {'Timestamp': '2022-05-07T15:44:19.660Z'}, 'URL': ''}] NaN L'ECAM est un Institut Supérieur Industriel ayant pour objet la formation de Master en sciences industrielles dans une des spécialités suivantes: automatisation, construction, électromécanique, électronique, géomètre, informatique, business analyst (alternance). NaN NaN NaN NaN NaN NaN ecam.jpg 512.0 512.0 93657.0 image/jpeg // NaN ecam.jpg École Centrale des Arts et Métiers - HE Vinci École Centrale des Arts et Métiers - HE Vinci ecole-centrale-des-arts-et-metiers ecole-centrale-des-arts-et-metiers [] ECAM ECAM NaN NaN NaN NaN NaN NaN
5361 contentful-entries_productionv3 _doc 5vp8xZpO6CucXtOmc1H8yR None [École communale fondamentale de Seneffe] 5vp8xZpO6CucXtOmc1H8yR profile 2022-05-15T09:12:19.246Z [{'Geometry': {'Viewport': {'Southwest': {'Lng': 4.252977370107278, 'Lat': 50.52898217010728}, 'Northeast': {'Lng': 4.255677029892722, 'Lat': 50.53168182989272}}, 'coordinates': [4.2543333, 50.5303456], 'type': 'Point', 'Location': {'Lng': 4.2543333, 'Lat': 50.5303456}}, 'Metadata': {'PlaceId': 'ChIJt1KItgg0wkcR6ekUYWMbdDg', 'AddressType': 'head office', 'Timestamp': '2022-05-07T18:58:11.863Z'}, 'FormattedAddress': 'Rue de Buisseret 19, 7180 Seneffe, Belgique', 'MainAddress': True}] NaN [] NaN Ecole fondamentale. NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN École communale fondamentale de Seneffe École communale fondamentale de Seneffe ecole-communale-de-seneffe ecole-communale-de-seneffe [] NaN NaN NaN NaN NaN NaN NaN NaN
This dataframe has 5365 rows × 40 columns. You can inspect the initial json response and dissect it further, maybe you need more/less/other information from it.
Requests docs:
Pandas relevant documentation:
Upvotes: 1
Reputation: 33381
I would advice here about several issues:
, not implicitly_wait
since the former is waiting for element presence only while with WebDriverWait
you can wait for more mature element states i.e. to be visible, clickable and more.WebDriverWait
and implicitly_wait
in the same file, it may cause page
buttons are on the bottom of the page, so you will need to scrool down and only after that to click the pager button.import time
from selenium import webdriver
from import Service
from import Options
from selenium.webdriver.common.action_chains import ActionChains
from import WebDriverWait
from import By
from import expected_conditions as EC
options = Options()
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(service=webdriver_service, options=options)
url = ""
actions = ActionChains(driver)
wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="axeptio_btn_acceptAll"]'))).click()
wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="axeptio_btn_configure"]'))).click()
wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="axeptio_btn_acceptAllAndNext"]'))).click()
driver.execute_script("window.scrollBy(0, arguments[0]);", 800)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.next a'))).click()
Upvotes: 1