Reputation: 31
im trying to get the cellphone/office phone number information off of this website: https://www.zillow.com/lender-profile/DougShoemaker/
ive tried playing around with bs4 but i can only get the first phone number. Im trying to get both office and cell numbers.
from selenium import webdriver
from bs4 import BeautifulSoup
import time
#Chrome webdriver filepath...Chromedriver version 74
driver = webdriver.Chrome(r'C:\Users\mfoytlin\Desktop\chromedriver.exe')
driver.get('https://www.zillow.com/lender-profile/DougShoemaker/')
soup = BeautifulSoup(driver.page_source, 'html.parser')
time.sleep(2)
phoneNum = driver.find_element_by_class_name('zsg-list_definition')
trial = phoneNum.find_element_by_class_name('zsg-sm-hide')
print(trial.text)
Upvotes: 0
Views: 88
Reputation: 11101
You don't have to use Selenium, or even BeautifulSoup. If you inspect network requests from Developer Tools (F12) > Network
you can see that the data is fetched using an XHR request
You can make this request yourself and use the JSON response anyway you like.
POST https://mortgageapi.zillow.com/getRegisteredLender?partnerId=RD-CZMBMCZ
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0
Referer: https://www.zillow.com/lender-profile/DougShoemaker/
Content-Type: application/json
{
"fields": [
"aboutMe",
"address",
"cellPhone",
# ... other fields
"website"
],
"lenderRef": {
"screenName": "DougShoemaker"
}
}
Now, with requests
library you can try:
import requests
if __name__ == '__main__':
payload = {
"fields": [
"screenName",
"cellPhone",
"officePhone",
"title",
],
"lenderRef": {
"screenName": "DougShoemaker"
}
}
res = requests.post('https://mortgageapi.zillow.com/getRegisteredLender?partnerId=RD-CZMBMCZ',
json=payload)
res.raise_for_status()
data = res.json()
cellphone, office_phone = data['lender']['cellPhone'], data['lender']['officePhone']
cellphone_num = '({areaCode}) {prefix}-{number}'.format(**cellphone)
office_phone_num = '({areaCode}) {prefix}-{number}'.format(**office_phone)
print(office_phone_num, cellphone_num)
which prints:
(618) 619-4120 (618) 795-0790
Upvotes: 2
Reputation: 193208
To extract the Office, Cell and Fax number, you have to induce WebDriverWait for the visibility_of_element_located()
and you can use either of the following Locator Strategies:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument('start-maximized')
# options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.zillow.com/lender-profile/DougShoemaker/')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Office']//following::dd[1]//span"))).get_attribute("innerHTML"))
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Cell']//following::dd[1]//span"))).get_attribute("innerHTML"))
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Fax']//following::dd[1]//span"))).get_attribute("innerHTML"))
Console Output:
(618) 619-4120
(618) 795-0790
(618) 619-4120
Upvotes: 0
Reputation: 1938
try following xpath for each phone numbers
Office Phone:
//dt[contains(text(),'Office')]/following-sibling::dd/div/span
Cell Phone:
//dt[contains(text(),'Cell')]/following-sibling::dd/div/span
Fax Number:
//dt[contains(text(),'Fax')]/following-sibling::dd/div/span
Upvotes: 0