mcfoyt
mcfoyt

Reputation: 31

Cant extract correct/all information need

im trying to get the cellphone/office phone number information off of this website: https://www.zillow.com/lender-profile/DougShoemaker/

ive tried playing around with bs4 but i can only get the first phone number. Im trying to get both office and cell numbers.

from selenium import webdriver
from bs4 import BeautifulSoup
import time


#Chrome webdriver filepath...Chromedriver version 74
driver = webdriver.Chrome(r'C:\Users\mfoytlin\Desktop\chromedriver.exe')
driver.get('https://www.zillow.com/lender-profile/DougShoemaker/')
soup = BeautifulSoup(driver.page_source, 'html.parser')
time.sleep(2)
phoneNum = driver.find_element_by_class_name('zsg-list_definition')
trial = phoneNum.find_element_by_class_name('zsg-sm-hide')
print(trial.text)

Upvotes: 0

Views: 88

Answers (3)

abdusco
abdusco

Reputation: 11101

You don't have to use Selenium, or even BeautifulSoup. If you inspect network requests from Developer Tools (F12) > Network you can see that the data is fetched using an XHR request

enter image description here

You can make this request yourself and use the JSON response anyway you like.

POST https://mortgageapi.zillow.com/getRegisteredLender?partnerId=RD-CZMBMCZ
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0
Referer: https://www.zillow.com/lender-profile/DougShoemaker/
Content-Type: application/json

{
  "fields": [
    "aboutMe",
    "address",
    "cellPhone",
    # ... other fields
    "website"
  ],
  "lenderRef": {
    "screenName": "DougShoemaker"
  }
}

Now, with requests library you can try:

import requests

if __name__ == '__main__':
    payload = {
        "fields": [
            "screenName",
            "cellPhone",
            "officePhone",
            "title",
        ],
        "lenderRef": {
            "screenName": "DougShoemaker"
        }
    }

    res = requests.post('https://mortgageapi.zillow.com/getRegisteredLender?partnerId=RD-CZMBMCZ',
                        json=payload)
    res.raise_for_status()
    data = res.json()

    cellphone, office_phone = data['lender']['cellPhone'], data['lender']['officePhone']
    cellphone_num = '({areaCode}) {prefix}-{number}'.format(**cellphone)
    office_phone_num = '({areaCode}) {prefix}-{number}'.format(**office_phone)
    print(office_phone_num, cellphone_num)

which prints:

(618) 619-4120 (618) 795-0790

Upvotes: 2

undetected Selenium
undetected Selenium

Reputation: 193208

To extract the Office, Cell and Fax number, you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions()
    options.add_argument('start-maximized')
    # options.add_argument('disable-infobars')
    options.add_argument('--disable-extensions')
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.zillow.com/lender-profile/DougShoemaker/')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Office']//following::dd[1]//span"))).get_attribute("innerHTML"))
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Cell']//following::dd[1]//span"))).get_attribute("innerHTML"))
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Fax']//following::dd[1]//span"))).get_attribute("innerHTML"))
    
  • Console Output:

    (618) 619-4120
    (618) 795-0790
    (618) 619-4120
    

Upvotes: 0

Sureshmani Kalirajan
Sureshmani Kalirajan

Reputation: 1938

try following xpath for each phone numbers

Office Phone:
//dt[contains(text(),'Office')]/following-sibling::dd/div/span
Cell Phone:
//dt[contains(text(),'Cell')]/following-sibling::dd/div/span
Fax Number:
//dt[contains(text(),'Fax')]/following-sibling::dd/div/span

Upvotes: 0

Related Questions