Reputation: 338
I'm trying to scrape this website:
https://www.novanthealth.org/home/patients--visitors/locations/clinics.aspx?behavioral-health=yes
I want to get the clinic names and addresses, and this is the python code I'm using
from selenium import webdriver
import pd
import time
#driver = webdriver.Chrome()
specialty = ["behavioral-health","dermatology","colon","ear-nose-and- throat","endocrine","express","family-practice","foot-and-ankle",
"gastroenterology","heart-%26-vascular","hepatobiliary-and-pancreas","infectious-disease","inpatient","internal-medicine",
"neurology","nutrition","ob%2Fgyn","occupational-medicine","oncology","orthopedics","osteoporosis","pain-management",
"pediatrics","plastic-surgery","pulmonary","rehabilitation","rheumatology","sleep","spine","sports-medicine","surgical","urgent-care",
"urology","weight-loss","wound-care","pharmacy"]
name = []
address = []
for q in specialty:
driver = webdriver.Chrome()
driver.get("https://www.novanthealth.org/home/patients-- visitors/locations/clinics.aspx?"+q+"=yes")
x = driver.find_element_by_class_name("loc-link-right")
num_page = str(x.text).split(" ")
x.click()
for i in num_page:
btn = driver.find_element_by_xpath('//*[@id="searchResults"]/div[2]/div[2]/button['+i+']')
btn.click()
time.sleep(8) #instaed of this use waituntil #
temp = driver.find_element_by_class_name("gray-background").text
temp0 = temp.replace("Get directions Website View providers\n","")
x_temp = temp0.split("\n\n\n")
for j in range(0,len(x_temp)-1):
temp1 = x_temp[j].split("Phone:")
name.append(temp1[0].split("\n")[1])
temp3 = temp1[1].split("Office hours:")
temp4 = temp3[0].split("\n")
temp5 = temp4[1:len(temp4)]
address.append(" ".join(temp5))
driver.close()
This code works fine If I use it for only one specialty at a time, but when I pass the specialties in a loop as above, the code fails in the second iteration with the error:
Traceback (most recent call last):
File "<stdin>", line 10, in <module>
File "C:\Anaconda2\lib\site- packages\selenium\webdriver\remote\webelement.py", line 77, in click self._execute(Command.CLICK_ELEMENT)
File C:\Anaconda2\lib\sitepackages\selenium\webdriver\remote\webelement.py", line 493, in _execute return self._parent.execute(command, params)
File "C:\Anaconda2\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 249, in execute self.error_handler.check_response(response)
File "C:\Anaconda2\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 193, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotVisibleException: Message: element not visible
(Session info: chrome=46.0.2490.80)
(Driver info: chromedriver=2.19.346078 (6f1f0cde889532d48ce8242342d0b84f94b114a1),platform=Windows NT 6.1 SP1 x86_64
I don't have much experience using python, any help will be appreciated
Upvotes: 1
Views: 496
Reputation: 336
Usually I would do in Selenium Basic, an excel plugin. You can use the same logic in Python. This is tried in VBA and works fine for me.
Private assert As New assert
Private driver As New Selenium.ChromeDriver
Sub sel_novanHealth()
Set ObjWB = ThisWorkbook
Set ObjExl_Sheet1 = ObjWB.Worksheets("Sheet1")
Dim Name As Variant
'Open the website
driver.get "https://www.novanthealth.org/home/patients--visitors/locations.aspx"
driver.Window.Maximize
driver.Wait (1000)
'Find out the total number of pages to be scraped
lnth = driver.FindElementsByXPath("//button[@class='paginate_button']").Count
'Running the Loop for the Pages
For y = 2 To lnth
'Running the Loop for the Elements
For x = 1 To 10
Name = driver.FindElementsByXPath("//div[@class='span12 loc-heading']")(x).Text
' Element 2
'Element 3
Next x
driver.FindElementsByXPath("//button[@class='paginate_button']")(y).Click
Next y
driver.Wait (1000)
End Sub
Upvotes: 1
Reputation: 1711
The Error message had told you why it not work.
ElementNotVisibleException: Message: element not visible
The element is not visible if you do not scroll down to see it.
You have to scroll down the list according to the size of your browser,
OR
Just extract the data from the source page, which is easier.
Upvotes: 1