Reputation: 580
I'm trying to scrape the table from this webpage: http://carefacility.doe.louisiana.gov/covid19/List.aspx?parish=Orleans
I'm using Selenium because I need to click from page 1 to page 2, 3, and 4, and scrape the table on each page, using this code: driver.execute_script("__doPostBack('ctl00$MainContent$gvFacilityList','Page$2')")
However, I can't get it to scrape even the first table. The following code gives me no output whatsoever -- it does not even print "hi."
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('http://carefacility.doe.louisiana.gov/covid19/List.aspx?parish=Orleans')
for tr in driver.find_elements_by_xpath('//*[@id="MainContent_gvFacilityList"]/table/tr'):
print("hi!")
tds = tr.find_elements_by_tag_name('td')
print ([td.text for td in tds])
I've read the other threads on Stackoverflow about this issue, but none of them make it clear to me why I'm getting no results.
Upvotes: 0
Views: 88
Reputation: 580
I was able to find a solution that does not require me to use the XPath at all. Instead, I save each page of results as an HTML file.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.implicitly_wait(30)
wait = WebDriverWait(driver, 20)
# base url
url = "http://carefacility.doe.louisiana.gov/covid19/List.aspx?parish=Orleans"
#scrape first page
driver.get(url)
print("scraping page 1")
with open(f'htmls/file1.html', 'w') as f:
f.write(driver.page_source)
#scrape the other pages
script = [f"__doPostBack('ctl00$MainContent$gvFacilityList','Page${num}')" for num in range(2,5)]
script_counter = 1
for item in script:
driver.get(url)
driver.execute_script(item)
script_counter +=1
print(f"scraping page {script_counter}")
with open(f'htmls/file{script_counter}.html', 'w') as f:
f.write(driver.page_source)
Then I scrape each HTML file using BeautifulSoup. Scraping a table with BeautifulSoup is very simple because all you need is soup.find("table") and then you save that table into a dataframe.
import pandas as pd
from bs4 import BeautifulSoup
import glob
files = glob.glob('htmls/*')
df_full = pd.DataFrame()
for file in files:
with open(file, 'r') as f:
content = f.read()
soup = BeautifulSoup(content, 'html.parser')
sp_table = soup.find("table")
df = pd.read_html(str(sp_table))[0]
df_full = df_full.append(df)
Upvotes: 0
Reputation: 29382
in case you want to scrape Facility Name
sample code :
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.implicitly_wait(30)
driver.get('http://carefacility.doe.louisiana.gov/covid19/List.aspx?parish=Orleans')
for name in driver.find_elements(By.CSS_SELECTOR, "a[id^='MainContent_gvFacilityList_lbFacility']"):
print(name.text)
or in case you want to scrape all the date :
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.implicitly_wait(30)
driver.get("http://carefacility.doe.louisiana.gov/covid19/List.aspx?parish=Orleans")
wait = WebDriverWait(driver, 20)
total_rows = len(driver.find_elements(By.XPATH, "//a[contains(@id,'MainContent_gvFacilityList_lbFacility')]"))
all_rows = driver.find_elements(By.XPATH, "//a[contains(@id,'MainContent_gvFacilityList_lbFacility')]")
name = []
license_type = []
age_range = []
city = []
for row in all_rows:
name.append(row.text)
license_type.append(row.find_element_by_xpath("../../following-sibling::td[1]").text)
age_range.append(row.find_element_by_xpath("../../following-sibling::td[2]").text)
city.append(row.find_element_by_xpath("../../following-sibling::td[3]").text)
print(name, license_type, age_range, city)
You would need these imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Output :
"C:\Program Files\Python39\python.exe" C:/Users/***/PycharmProjects/SeleniumSO/Chrome.py
['3 Sisters Academy', 'Academy of the Sacred Heart Little Heart', "ACS Children's House", "ACS Children's House Gentilly", 'Angel Care Learning Center', 'Angels Haven Daycare and Preschool', 'Anthonika Gidney', 'Audubon Primary Academy', 'Audubon Primary Preschool', 'Auntie B Learning Academy', 'Because Wee Care Learning Academy', 'Benjamin Thomas Academy', 'Bridgett White', 'Bright Horizons at Tulane University', 'Bright Minds Academy, LLC', "Carbo's Learning Express", "Carbo's Learning Express-East", 'Carolyn Green Ford Head Start', 'Carrollton-Dunbar Head Start Center', 'Changing Stages', "Children's College of Academics", "Children's Palace Learning Academy", "Children's Palace Learning Academy", "Children's Place Love Center Learning Academy", "Children's Place LTD", "Clara's Little Lambs Preschool #5", "Clara's Little Lambs Preschool Academy", 'Claras Little Lambs at Federal City', 'Coloring House Christian Academy', 'Covered Kids Learning Academy', 'Cream of the Crop', 'Creative Kidz East', 'Crescent Cradle at Cabrini High School', 'Cub Corner Preschool', 'Cuddly Bear Child Development Center', "D J's Learning Castle", 'Danielle Ann Varnado', 'Diana Head Start Center', 'Dionne Harvey', 'Discovery Kids Preschool and Daycare Center', "DJ's Learning Center LLC", 'Dr. Peter W. Dangerfield Head Start Center', 'Dryades YMCA Daycare', 'Early Discovery Child Care Center', 'Early Learning Center of NOBTS', 'Early Partners', 'Ecole Bilingue de la Nouvelle Orleans', 'Educare New Orleans', 'Ethel Woodard', 'First Academy Early Learning Center'] ['Early Learning Center III', 'Early Learning Center I', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Family Child Care Provider', 'Early Learning Center II', 'Early Learning Center II', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center II', 'Family Child Care Provider', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center II', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center I', 'Early Learning Center I', 'Early Learning Center III', 'Early Learning Center III', 'Family Child Care Provider', 'Early Learning Center III', 'Family Child Care Provider', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center III', 'Early Learning Center I', 'Early Learning Center III', 'In-Home Provider', 'Early Learning Center I'] ['6 W To 12 Y', '6 W To 5 Y', '3 Y To 6 Y', '3 Y To 6 Y', '6 W To 12 Y', '6 W To 13 Y', '0 Y To 12 Y', '6 W To 12 Y', '5 W To 12 Y', '3 W To 12 Y', '6 W To 16 Y', '6 W To 5 Y', '0 Y To 12 Y', '6 W To 5 Y', '6 W To 12 Y', '6 W To 12 Y', '6 W To 12 Y', '1 W To 5 Y', '35 M To 5 Y', '6 W To 16 Y', '6 W To 12 Y', '6 W To 12 Y', '6 W To 12 Y', '6 W To 12 Y', '6 W To 4 Y', '1 W To 12 Y', '6 W To 12 Y', '6 W To 14 Y', '6 W To 17 Y', '6 W To 12 Y', '6 W To 12 Y', '6 W To 14 Y', '6 W To 4 Y', '6 W To 3 Y', '3 M To 12 Y', '6 W To 12 Y', '00 Y To 12 Y', '35 M To 5 Y', '00 Y To 12 Y', '6 W To 12 Y', '6 W To 12 Y', '34 M To 5 Y', '6 M To 12 Y', '8 W To 4 Y', '6 W To 12 Y', '3 Y To 4 Y', '18 M To 5 Y', '6 W To 5 Y', '0 Y To 12 Y', '6 W To 4 Y'] ['New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans', 'New Orleans']
Process finished with exit code 0
Upvotes: 1