sha_phys
sha_phys

Reputation: 1

difference between these two script: one works one does not, requests and urllib

import time
import random
from bs4 import BeautifulSoup as bs
import urllib
import urllib.request as url

html = urllib.request.urlopen('https://www.yelp.com/biz/the-stillery-nashville?osq=Restaurants+Nashville+Tn').read().decode('utf-8')
soup = bs(html, 'html.parser')

relevant= soup.find_all('p', class_='comment__09f24__gu0rG css-qgunke')

for div in relevant:
        for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
            text = html_class.find('span')
            review = html_class.getText()
            print(review)

This works. But this does not. I do not understand why the second is not working.

import time
import random
from bs4 import BeautifulSoup 
import urllib
import urllib.request as url
import html

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

url='https://www.yelp.com/biz/the-hampton-social-nashville-nashville-2?osq=Restaurants+Nashville+Tn'
response=requests.get(url, headers=headers)
soup2 = BeautifulSoup(response.text, 'html.parser')

relevant2= soup2.find_all('p', class_='comment__09f24__gu0rG css-qgunke')
for div in relevant2:
        for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
            #text = html_class.find('span')
            review2 = html_class.get_text()
            print(review2)
            

I am looking to get the reviews.

I tried the code listed above for scraping restaurants from yelp datasets

Upvotes: 0

Views: 48

Answers (1)

shipskin
shipskin

Reputation: 1

From here https://stackoverflow.com/a/38114548

use requests.get(url, headers=headers, verify=False)

Upvotes: 0

Related Questions