Scrape data after element click (or) href link auto click on webpage during webscrapping

Question

Scrape data after element click (or) href link auto click on webpage during webscrapping, please note there is no x-path to click. Please guide me, i am new for invidual elements clicks.

https://www.yelp.com/search?find_desc=Gastroenterologist&find_loc=Houston%2C+TX+77002 - i can able to scrape this link but i am not aware how to scrape invidual elements and tags please guide me with reference code, if it is any other menthod also fine. Thanks in advance

Invidual link - https://www.yelp.com/biz/john-clemmons-jr-md-houston?osq=Gastroenterologist

#required outputs are- 1. phone number - (713) 526-4263, 
#                     2. address      - 1200 Binz St Ste 1025 Park Plaza Medical Associates Houston, #TX 77004,
#                     3. webaddress  - http://www.Parkplazamed.com,

#format = [phone_number1, phone_number2, etc, ....]

import bs4
from bs4 import BeautifulSoup
from csv import writer
import requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:96.0) Gecko/20100101 Firefox/96.0'}
#HOST = 'https://www.zocdoc.com'
#PAGE = 'gastroenterologists/2'
web_page = 'https://www.yelp.com/search?find_desc=Gastroenterologist&find_loc=Houston%2C+TX+77002&ns=1'
with requests.Session() as session:
    (r := session.get(HOST, headers=headers)).raise_for_status()
    #(r := session.get(f'{HOST}/{PAGE}', headers=headers)).raise_for_status()
    (r := session.get(f'{web_page}', headers=headers)).raise_for_status()
    # process content from here
print(r.text)

soup = BeautifulSoup(r.text, 'lxml')
soup
print(soup.prettify())

movies_html = soup.find_all('a', attrs={'class': 'css-1422juy'})

doctor_n = []

for title in movies_html:
 doctor_n.append(title.text.strip())
print(doctor_n)

Andrej Kesely · Accepted Answer

To get the data of the local business, you can parse the Json data embedded inside the page. For example:

import json
import requests
from bs4 import BeautifulSoup


url = "https://www.yelp.com/biz/john-clemmons-jr-md-houston?osq=Gastroenterologist"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = {}
for d in soup.select('[type="application/ld+json"]'):
    d = json.loads(d.contents[0])
    data[d["@type"]] = d


print(data["LocalBusiness"]["name"])
print(data["LocalBusiness"]["telephone"])
print(data["LocalBusiness"]["address"])

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

Prints:

John Clemmons Jr, MD
(713) 528-6562
{
    "streetAddress": "1213 Hermann Dr
Ste 420",
    "addressLocality": "Houston",
    "addressCountry": "US",
    "addressRegion": "TX",
    "postalCode": "77004",
}

Scrape data after element click (or) href link auto click on webpage during webscrapping

Answers (1)

Related Questions