Rabiyulfahim
Rabiyulfahim

Reputation: 107

Scrape data after element click (or) href link auto click on webpage during webscrapping

Scrape data after element click (or) href link auto click on webpage during webscrapping, please note there is no x-path to click. Please guide me, i am new for invidual elements clicks.

https://www.yelp.com/search?find_desc=Gastroenterologist&find_loc=Houston%2C+TX+77002 - i can able to scrape this link but i am not aware how to scrape invidual elements and tags please guide me with reference code, if it is any other menthod also fine. Thanks in advance

Invidual link - https://www.yelp.com/biz/john-clemmons-jr-md-houston?osq=Gastroenterologist

#required outputs are- 1. phone number - (713) 526-4263, 
#                     2. address      - 1200 Binz St Ste 1025 Park Plaza Medical Associates Houston, #TX 77004,
#                     3. webaddress  - http://www.Parkplazamed.com,

#format = [phone_number1, phone_number2, etc, ....]
import bs4
from bs4 import BeautifulSoup
from csv import writer
import requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:96.0) Gecko/20100101 Firefox/96.0'}
#HOST = 'https://www.zocdoc.com'
#PAGE = 'gastroenterologists/2'
web_page = 'https://www.yelp.com/search?find_desc=Gastroenterologist&find_loc=Houston%2C+TX+77002&ns=1'
with requests.Session() as session:
    (r := session.get(HOST, headers=headers)).raise_for_status()
    #(r := session.get(f'{HOST}/{PAGE}', headers=headers)).raise_for_status()
    (r := session.get(f'{web_page}', headers=headers)).raise_for_status()
    # process content from here
print(r.text)

soup = BeautifulSoup(r.text, 'lxml')
soup
print(soup.prettify())

movies_html = soup.find_all('a', attrs={'class': 'css-1422juy'})

doctor_n = []

for title in movies_html:
 doctor_n.append(title.text.strip())
print(doctor_n)

Upvotes: 1

Views: 148

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195418

To get the data of the local business, you can parse the Json data embedded inside the page. For example:

import json
import requests
from bs4 import BeautifulSoup


url = "https://www.yelp.com/biz/john-clemmons-jr-md-houston?osq=Gastroenterologist"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = {}
for d in soup.select('[type="application/ld+json"]'):
    d = json.loads(d.contents[0])
    data[d["@type"]] = d


print(data["LocalBusiness"]["name"])
print(data["LocalBusiness"]["telephone"])
print(data["LocalBusiness"]["address"])

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

Prints:

John Clemmons Jr, MD
(713) 528-6562
{
    "streetAddress": "1213 Hermann Dr\nSte 420",
    "addressLocality": "Houston",
    "addressCountry": "US",
    "addressRegion": "TX",
    "postalCode": "77004",
}

Upvotes: 1

Related Questions