Reputation: 5
I am trying to create a dataframe consisting reviews on 20 banks and in the following code I am trying to get the 20 customers rating score value but finding it difficult as I am new BeautifulSoup and Webscraping.
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')
Rating = []
rat_elem = soup.find_all('span')
for rate in rat_elem:
Rating.append(rate.find_all('div').get('value'))
print(Rating)
Upvotes: 0
Views: 183
Reputation: 729
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')
# Find all the span elements where the "itemprop" attribute is "ratingvalue".
Rating = [item.text for item in soup.find_all('span', attrs={"itemprop":"ratingvalue"})]
print(Rating)
# The output
# ['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']
BeautifulSoup keyword arguments
Upvotes: 0
Reputation: 1734
I prefer using CSS selectors, so you should be able to target all the spans by targeting the ones with the itemprop
attribute set to ratingvalue
.
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')
Rating = []
for rate in soup.select('span[itemprop=ratingvalue]'):
Rating.append(rate.get_text())
print(Rating)
Relevant output
['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']
EDIT: add relevant output
Upvotes: 2