Reputation: 4518
I am trying to extract the number 808
from this hyperlink:
<a class="a-link-normal feedback-detail-description" href="#"><b>100% positive</b> in the last 12 months (808 ratings)</a>
I have written the code below and it returns []
. I am not sure what I have to add to extract the number 808
as easy as possible.
Will highly appreciate some input!
seller_feedback_span = soup.findAll("div", {"class": "a-link-normal feedback-detail-description"})
print(seller_feedback_span)
Upvotes: 0
Views: 61
Reputation: 92854
With soup.select
feature and specific regex pattern:
from bs4 import BeautifulSoup
import re
html_data = '''<a class="a-link-normal feedback-detail-description" href="#">
<b>100% positive</b> in the last 12 months (808 ratings)</a>'''
soup = BeautifulSoup(html_data, 'html.parser')
seller_feedback_span = soup.select("a.a-link-normal.feedback-detail-description b")
rating = re.search(r'\d+(?=\s*ratings)', seller_feedback_span[0].nextSibling).group()
print(rating) # 808
Upvotes: 0
Reputation: 33384
Use CSS selector which fast in retrieving data:
from bs4 import BeautifulSoup
data='''<a class="a-link-normal feedback-detail-description" href="#"><b>100% positive</b> in the last 12 months (808 ratings)</a>'''
soup=BeautifulSoup(data,'html.parser')
item=soup.select_one('.feedback-detail-description').text.split('(')[1].split('ratings')[0].strip()
print(item)
Output:
808
Upvotes: 1
Reputation: 28595
html = '''<a class="a-link-normal feedback-detail-description" href="#"><b>100% positive</b> in the last 12 months (808 ratings)</a>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
seller_feedback_span = soup.findAll("a", {"class": "a-link-normal feedback-detail-description"})
#print(seller_feedback_span)
for feedback in seller_feedback_span:
rating = feedback.text.split('(')[-1].split('ratings')[0].strip()
print (rating)
Output:
print (rating)
808
Upvotes: 1