Reputation: 81
I'm using beautifulsoup to scrape reviews. I have the scraping part down and am ready to write my code to a csv file. Looking at many examples online, I am still not understanding how to write to a csv file. My scraping code is
for i in range(0,200,5):
url = "https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-or" + str(i) + "-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
response = requests.get(url, headers=headers, verify=False).text
soup = BeautifulSoup(response, "lxml")
reviews = soup.find_all('div', 'reviewSelector')
for r in reviews:
print("Rating: ", int(r.find('span','ui_bubble_rating')['class'][1].split('_')[1])/10)
print("Review snipet: ", r.p.text)
To write to a csv I tried wrapping my code in the csv.writer method
with open('TA-reviews.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',', quotechar='"')
for i in range(0,200,5):
url = "https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-or" + str(i) + "-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
response = requests.get(url, headers=headers, verify=False).text
soup = BeautifulSoup(response, "lxml")
reviews = soup.find_all('div', 'reviewSelector')
for r in reviews:
print("Rating: ", int(r.find('span','ui_bubble_rating')['class'][1].split('_')[1])/10)
print("Review snipet: ", r.p.text)
writer.writerow((rating, review))
Which returns an error that rating is undefined yet one rating is printed out
Upvotes: 0
Views: 58
Reputation: 77952
Which returns an error that rating is undefined
Of course rating
is undefined. Where in your code do you have a statement binding anything to the name rating
?
yet one rating is printed out
what you print out is the expression int(r.find('span','ui_bubble_rating')['class'][1].split('_')[1])/10
. This does not define any rating
variable.
You want:
for r in reviews:
rating = int(r.find('span','ui_bubble_rating')['class'][1].split('_')[1])/10
review = r.p.text
writer.writerow((rating, review))
Upvotes: 1