Reputation: 423
I am trying to scrape address from the below link:
https://www.yelp.com/biz/rollin-phatties-houston
But I am getting only the first value of the address (i.e.: 1731 Westheimer Rd
) out of complete address which is separated by a comma:
1731 Westheimer Rd, Houston, TX 77098
Can anyone help me out in this, please find my code below:
import bs4 as bs
import urllib.request as url
source = url.urlopen('https://www.yelp.com/biz/rollin-phatties-houston')
soup = bs.BeautifulSoup(source, 'html.parser')
mains = soup.find_all("div", {"class": "secondaryAttributes__09f24__3db5x arrange-unit__09f24__1gZC1 border-color--default__09f24__R1nRO"})
main = mains[0] #First item of mains
address = []
for main in mains:
try:
address.append(main.address.find("p").text)
except:
address.append("")
print(address)
# 1731 Westheimer Rd
Upvotes: 0
Views: 580
Reputation: 52
The business address that is shown on the webpage is generated dynamically. If you view Page Source of the URL, you will find that the address of the restaurant is stored in a script element. So you need to extract the address from it.
from bs4 import BeautifulSoup
import requests
import json
page = requests.get('https://www.yelp.com/biz/rollin-phatties-houston')
htmlpage = BeautifulSoup(page.text, 'html.parser')
scriptelements = htmlpage.find_all('script', attrs={'type':'application/json'})
scriptcontent = scriptelements[2].text
scriptcontent = scriptcontent.replace('<!--', '')
scriptcontent = scriptcontent.replace('-->', '')
jsondata = json.loads(scriptcontent)
print(jsondata['bizDetailsPageProps']['bizContactInfoProps']['businessAddress'])
Using the above code, you will be able to extract the address of any business.
Upvotes: 1
Reputation: 11525
import requests
import re
from ast import literal_eval
def main(url):
r = requests.get(url)
match = literal_eval(
re.search(r'addressLines.+?(\[.+?])', r.text).group(1))
print(*match)
main('https://www.yelp.com/biz/rollin-phatties-houston')
Output:
1731 Westheimer Rd Houston, TX 77098
Upvotes: 2
Reputation: 1432
There is no need to find the address information by inspecting the element, actually, the data inside a javascript tag element is passed onto the page already. You can get it by the following code
import chompjs
import bs4 as bs
import urllib.request as url
source = url.urlopen('https://www.yelp.com/biz/rollin-phatties-houston')
soup = bs.BeautifulSoup(source, 'html.parser')
javascript = soup.select("script")[16].string
data = chompjs.parse_js_object(javascript)
data['bizDetailsPageProps']['bizContactInfoProps']['businessAddress']
Upvotes: 1