mcfoyt
mcfoyt

Reputation: 31

Trying to find if certain text exists

im trying to see if the text "Nationally registered" exists on the profile pages on a website i am scraping. Its right after the text "Licensed to work in: " ... if it contains the text i will write their license type into a csv file as "Nationally registered" and if that text does not exist i will write "state" for the license in the csv file...thats the problem/coding logic im using

Heres the link to the profile page i am testing my code out on https://www.zillow.com/lender-profile/zackdisinger/

it keeps printing false... below is my code that im trying

from selenium import webdriver
from bs4 import BeautifulSoup
import time

#Chrome webdriver filepath...Chromedriver version 74
driver = webdriver.Chrome(r'C:\Users\mfoytlin\Desktop\chromedriver.exe')
page = driver.get('https://www.zillow.com/lender-profile/zackdisinger/')
time.sleep(2)
show_more_button = driver.find_element_by_class_name('zsg-wrapper-footer').click()
time.sleep(2)
soup = BeautifulSoup(driver.page_source, 'html.parser')


if soup.find(text='Nationally registered'):
    print('Success')
else:
    print('False')

Upvotes: 1

Views: 239

Answers (4)

QHarr
QHarr

Reputation: 84465

With bs4 4.7.1 you can use :contains to check for p tag containing that string. I've given True/False though easy to adapt to Success/False

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#Chrome webdriver filepath...Chromedriver version 74
driver = webdriver.Chrome(r'C:\Users\mfoytlin\Desktop\chromedriver.exe')
page = driver.get('https://www.zillow.com/lender-profile/zackdisinger/')
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".zsg-wrapper-footer a"))).click()
soup = BeautifulSoup(driver.page_source, 'html.parser')
data = soup.select_one('p:contains("Nationally registered")')
print(data is not None)

Upvotes: 2

KunduK
KunduK

Reputation: 33384

Use regular expression re to check the text exist or not.Here is your code.

from selenium import webdriver
from bs4 import BeautifulSoup
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import re

#Chrome webdriver filepath...Chromedriver version 74
driver = webdriver.Chrome(r'C:\Users\mfoytlin\Desktop\chromedriver.exe')
page = driver.get('https://www.zillow.com/lender-profile/zackdisinger/')
show_more_button =WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(.,'Show')][contains(.,'more')]")))
#driver.execute_script("arguments[0].click();", show_more_button)
show_more_button.click()
time.sleep(2)
soup = BeautifulSoup(driver.page_source, 'html.parser')


if soup.find(text=re.compile('Nationally registered')):
    print('Success')
else:
    print('False')

It is printing success on console.

Success

Upvotes: 1

Sureshmani Kalirajan
Sureshmani Kalirajan

Reputation: 1938

try the conditional block like this,

if(driver.findElement(By.xpath("//p[contains(text(),'Nationally registered')]").isDisplayed())
{
 print('Success')
}
else {
print ('False')
}

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195468

The data is loaded through AJAX from different URL:

import re
import requests
import json

url = 'https://www.zillow.com/lender-profile/zackdisinger/'
screen_name = [i for i in url.split('/') if i][-1]
r = requests.get(url).text

url_json = 'https://mortgageapi.zillow.com/getRegisteredLender?partnerId=' + re.search(r'"partnerId":"(.*?)"', r).group(1)
payload = {"fields":["aboutMe","address","cellPhone","contactLenderFormDisclaimer","companyName","employerMemberFDIC","employerScreenName","equalHousingLogo","faxPhone","hideCellPhone","imageId","individualName","languagesSpoken","memberFDIC","nationallyRegistered","nmlsId","nmlsType","officePhone","rating","screenName","stateLicenses","stateSponsorships","title","totalReviews","website"],"lenderRef":{"screenName":screen_name}}
data = requests.post(url_json, json=payload).json()
print(json.dumps(data, indent=4))
print()
print('Is nationally registered =', data['lender']['nationallyRegistered'])

Prints:

{
    "lender": {
        "aboutMe": "From day one I provide the utmost relational-based experience to make you feel comfortable with your home financing decisions.\n\nEmpowerment and integrity is key to successfully making a home loan a smooth process from start to finish. Acquiring a mortgage in today's market takes product knowledge and underwriting know how. Every client has their own story, their own future. I am here to match today's mortgages to clients dreams of home-ownership.\n",
        "address": {
            "address": "10412 Allisonville Rd Suite 50",
            "city": "Fishers",
            "stateAbbreviation": "IN",
            "zipCode": "46038"
        },
        "companyName": "Bank of England Mortgage",
        "employerMemberFDIC": true,
        "employerScreenName": "BoEMortgage",
        "equalHousingLogo": "EqualHousingLender",
        "faxPhone": {
            "areaCode": "317",
            "number": "3754",
            "prefix": "536"
        },
        "id": "ZU101hnzx7ntuyx_8z2sb",
        "imageId": "2910837992a9cc44d31c26bd7532d2dd",
        "individualName": {
            "firstName": "Zachary",
            "lastName": "Disinger"
        },
        "languagesSpoken": [],
        "nationallyRegistered": true,
        "nmlsId": 1053091,
        "nmlsType": "Individual",
        "officePhone": {
            "areaCode": "317",
            "number": "0416",
            "prefix": "252"
        },
        "rating": 5.0,
        "screenName": "zackdisinger",
        "stateLicenses": {},
        "stateSponsorships": {},
        "title": "Mortgage Banker",
        "totalReviews": 120,
        "website": "http://boeindy.com"
    }
}

Is nationally registered = True

Upvotes: 1

Related Questions