Anatol Zakrividoroga
Anatol Zakrividoroga

Reputation: 4518

How to extract a number from a hyperlink with BeautifulSoup

I am trying to extract the number 808 from this hyperlink:

<a class="a-link-normal feedback-detail-description" href="#"><b>100% positive</b> in the last 12 months (808 ratings)</a>

I have written the code below and it returns []. I am not sure what I have to add to extract the number 808 as easy as possible.

Will highly appreciate some input!

seller_feedback_span = soup.findAll("div", {"class": "a-link-normal feedback-detail-description"})
print(seller_feedback_span)

Upvotes: 0

Views: 61

Answers (3)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

With soup.select feature and specific regex pattern:

from bs4 import BeautifulSoup
import re

html_data = '''<a class="a-link-normal feedback-detail-description" href="#">
<b>100% positive</b> in the last 12 months (808 ratings)</a>'''

soup = BeautifulSoup(html_data, 'html.parser')
seller_feedback_span = soup.select("a.a-link-normal.feedback-detail-description b")
rating = re.search(r'\d+(?=\s*ratings)', seller_feedback_span[0].nextSibling).group()

print(rating)   # 808

Upvotes: 0

KunduK
KunduK

Reputation: 33384

Use CSS selector which fast in retrieving data:

from bs4 import BeautifulSoup

data='''<a class="a-link-normal feedback-detail-description" href="#"><b>100% positive</b> in the last 12 months (808 ratings)</a>'''
soup=BeautifulSoup(data,'html.parser')
item=soup.select_one('.feedback-detail-description').text.split('(')[1].split('ratings')[0].strip()

print(item)

Output:

808

Upvotes: 1

chitown88
chitown88

Reputation: 28595

html = '''<a class="a-link-normal feedback-detail-description" href="#"><b>100% positive</b> in the last 12 months (808 ratings)</a>'''


from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')

seller_feedback_span = soup.findAll("a", {"class": "a-link-normal feedback-detail-description"})
#print(seller_feedback_span)

for feedback in seller_feedback_span:
    rating = feedback.text.split('(')[-1].split('ratings')[0].strip()
    print (rating)

Output:

print (rating)
808

Upvotes: 1

Related Questions