Corncobpipe
Corncobpipe

Reputation: 51

Cycling through pages to webscrape

I am trying to access data at this site: http://surge.srcc.lsu.edu/s1.html. So far I have my code loop through the drop down menus, and I want to loop the pages at the top of the table [1] [2] .. ect. I have tried to use Select, but I get an error that Select cannot be used with span: "UnexpectedTagNameException: Select only works on elements, not on < span >".

# importing libraries
from selenium import webdriver
import time
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
import re

driver = webdriver.Firefox()
driver.get("http://surge.srcc.lsu.edu/s1.html")

# definition for switching frames
def frame_switch(css_selector):
  driver.switch_to.frame(driver.find_element_by_css_selector(css_selector))  

# data is in an iframe
frame_switch("iframe")

html_source = driver.page_source
nameSelect = Select(driver.find_element_by_xpath('//select[@id="storm_name"]'))
stormCount = len(nameSelect.options)
data=[]
for i in range(1, stormCount):
    print("starting loop on option storm " + nameSelect.options[i].text)
    nameSelect.select_by_index(i)
    time.sleep(3)


    yearSelect = Select(driver.find_element_by_xpath('//select[@id="year"]'))
    yearCount = len(yearSelect.options)
    for j in range(1, yearCount):
        print("starting loop on option year " + yearSelect.options[j].text)
        yearSelect.select_by_index(j)


        time.sleep(2)

This is where I am having issues selecting the page:

        change_page=Select(driver.find_element_by_class_name("yui-pg-pages"))
        page_count = len(change_page.options)
        for k in range(1, page_count):
            change_page.select_by_index(k)



        # Select Page & run following code
            soup = BeautifulSoup(driver.page_source, 'html.parser')
            print(soup.find_all("tbody", {"class" : re.compile(".*")})[1])
            # get the needed table body
            table=soup.find_all("tbody", {"class" : re.compile(".*")})[1] 
            rows = table.find_all('tr')
            for row in rows:
                cols = row.find_all('td')
                cols = [ele.text.strip() for ele in cols]
                data.append(cols)

Upvotes: 1

Views: 155

Answers (1)

RattleyCooper
RattleyCooper

Reputation: 5207

Use an xpath selector instead.

driver.find_element_by_xpath('//a[@class="yui-pg-next"]')  

Then just loop while you can interact with the next button. I prefer this method if the amount of pages can change while I'm looping through the pages. You shouldn't need to use Select. In fact, I don't think Select is meant for anything but drop-down menus.

Or if you need to do it using the page links because the pages don't change often, you could try something like:

# Use find_elements_by_xpath to select multiple elements.
pages = driver.find_elements_by_xpath('//a[@class="yui-pg-page"]')

# loop through results
for page_link in pages:
    page_link.click()
    # do stuff.

Upvotes: 1

Related Questions