Liban West
Liban West

Reputation: 43

How to open multiple hrefs within a webtable to scrape through selenium

I'm trying to scrape this website using python and selenium. However all the information I need is not on the main page, so how would I click the links in the 'Application number' column one by one go to that page scrape the information then return to original page?

Ive tried:

def getData():
  data = []
  select = Select(driver.find_elements_by_xpath('//*[@id="node-41"]/div/div/div/div/div/div[1]/table/tbody/tr/td/a/@href'))
  list_options = select.options
  for item in range(len(list_options)):
    item.click()
  driver.get(url)

URL: http://www.scilly.gov.uk/planning-development/planning-applications

Screenshot of the site: enter image description here

Upvotes: 0

Views: 365

Answers (3)

undetected Selenium
undetected Selenium

Reputation: 193108

To open multiple hrefs within a webtable to scrape through selenium you can use the following solution:

  • Code Block:

      from selenium import webdriver
      from selenium.webdriver.chrome.options import Options
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
    
      hrefs = []
      options = Options()
      options.add_argument("start-maximized")
      options.add_argument("disable-infobars")
      options.add_argument("--disable-extensions")
      options.add_argument("--disable-gpu")
      options.add_argument("--no-sandbox")
      driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
      driver.get('http://www.scilly.gov.uk/planning-development/planning-applications')
      windows_before  = driver.current_window_handle # Store the parent_window_handle for future use
      elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "td.views-field.views-field-title>a"))) # Induce WebDriverWait for the visibility of the desired elements
      for element in elements:
          hrefs.append(element.get_attribute("href")) # Collect the required href attributes and store in a list
      for href in hrefs:
          driver.execute_script("window.open('" + href +"');") # Open the hrefs one by one through execute_script method in a new tab
          WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2)) # Induce  WebDriverWait for the number_of_windows_to_be 2
          windows_after = driver.window_handles
          new_window = [x for x in windows_after if x != windows_before][0] # Identify the newly opened window
          # driver.switch_to_window(new_window) <!---deprecated>
          driver.switch_to.window(new_window) # switch_to the new window
          # perform your webscraping here
          print(driver.title) # print the page title or your perform your webscraping
          driver.close() # close the window
          # driver.switch_to_window(windows_before) <!---deprecated>
          driver.switch_to.window(windows_before) # switch_to the parent_window_handle
      driver.quit() #Quit your program
    
  • Console Output:

      Planning application: P/18/064 | Council of the ISLES OF SCILLY
      Planning application: P/18/063 | Council of the ISLES OF SCILLY
      Planning application: P/18/062 | Council of the ISLES OF SCILLY
      Planning application: P/18/061 | Council of the ISLES OF SCILLY
      Planning application: p/18/059 | Council of the ISLES OF SCILLY
      Planning application: P/18/058 | Council of the ISLES OF SCILLY
      Planning application: P/18/057 | Council of the ISLES OF SCILLY
      Planning application: P/18/056 | Council of the ISLES OF SCILLY
      Planning application: P/18/055 | Council of the ISLES OF SCILLY
      Planning application: P/18/054 | Council of the ISLES OF SCILLY
    

References

You can find a couple of relevant detailed discussions in:

Upvotes: 1

theGuy
theGuy

Reputation: 693

When you navigate to new page DOM is refreshed and you cannot use list method here. Here is my approach for this action (I don't code much in python so syntax and indendation may be broken)

count = driver.find_elements_by_xpath("//table[@class='views-table cols-6']/tbody/tr") # to count total number of links
len(count)
j = 1
if j<=len:
    driver.find_element_by_xpath("//table[@class='views-table cols-6']/tbody/tr["+str(j)+"]/td/a").click()

    #add wait here
    #do your scrape action here  

    driver.find_element_by_xpath("//a[text()='Back to planning applications']").click()#to go back to main page

    #add wait here for main page to load.
    j+=1

Upvotes: 0

Julian Silvestri
Julian Silvestri

Reputation: 2027

What you can do is the following:

import selenium
from selenium.webdriver.common.keys import Keys
from selenium import Webdriver
import time

url = "url"
browser = Webdriver.Chrome() #or whatever driver you use
browser.find_element_by_class_name("views-field views-field-title").click()
# or use this browser.find_element_by_xpath("xpath")
#Note you will need to change the class name to click a different item in the table
    time.sleep(5) # not the best way to do this but its simple. Just to make sure things load
#it is here that you will be able to scrape the new url I will not post that as you can scrape what you want. 
# When you are done scraping you can return to the previous page with this
driver.execute_script("window.history.go(-1)")

hope this is what you are looking for.

Upvotes: 0

Related Questions