theraredram
theraredram

Reputation: 51

How to get text from span tag in BeautifulSoup in loop

I'm trying to scrape some information from a website that has the following html repeated several times:

<div class="product-details">
   <h2 class="product-name" title=" Weekly Roundup"> Weekly Roundup</h2>
   <span class="reference-number">REF NO. A1400.5</span>

I'm trying to scrape the product name and the text "REF NO. A1400.5". I need to scrape several product names and reference numbers on the same page and store them within a list. I tried:

product_new = []
product_ref = []

for caption in soup.find_all(class_='product-details'):
    product_name_new = caption.find(class_='product-name').text
    product_new.append(product_name_new)
    product_name_ref = (soup.select_one("span[class*=reference]").text)
    product_ref.append(product_name_ref)    
product_size_new = len(product_new)
print("Setup Complete", product_size_new)
print(*product_new,sep='\n')
print(*product_ref,sep='\n')

product_new works perfectly and returns a list of all product names, however product_ref only contains REF NO. A1400.5 repeated for the number of times the class reference appears on the webpage. How can I change this to correctly store the information contained within reference for each time it appears on the page?

Thank you!

Upvotes: 1

Views: 992

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195593

In your code, product_name_ref is always the same value, because you are selecting from soup, not from caption.

To get desired information, you can use this example:

from bs4 import BeautifulSoup


txt = '''
<div class="product-details">
   <h2 class="product-name" title=" Weekly Roundup"> Weekly Roundup</h2>
   <span class="reference-number">REF NO. A1400.5</span>
</div>

<div class="product-details">
   <h2 class="product-name" title=" Weekly Roundup"> Weekly Roundup 2</h2>
   <span class="reference-number">REF NO. A1400.5 2</span>
</div>
'''

soup = BeautifulSoup(txt, 'html.parser')

product_new = []
product_ref = []

for product in soup.select('div.product-details'):
    product_new.append(product.h2.get_text(strip=True))
    product_ref.append(product.select_one('span.reference-number').get_text(strip=True))

print(product_new)
print(product_ref)

Prints:

['Weekly Roundup', 'Weekly Roundup 2']
['REF NO. A1400.5', 'REF NO. A1400.5 2']

EDIT:

product_new = []
product_ref = []

for product in soup.select('div.product-details'):
    n = product.h2
    r = product.select_one('span.reference-number')

    if n and r:
        product_new.append(n.get_text(strip=True))
        product_ref.append(r.get_text(strip=True))

print(product_new)
print(product_ref)

EDIT2:

from bs4 import BeautifulSoup


txt = '''
<div class="product-details">
   <h2 class="product-name" title=" Weekly Roundup"> Weekly Roundup</h2>
   <span class="reference-number">REF NO. A1400.5</span>
</div>

<div class="product-details">
   <h2 class="product-name" title=" Weekly Roundup"> Weekly Roundup 2</h2>
   <span class="reference-number">REF NO. A1400.6</span>
</div>
'''

soup = BeautifulSoup(txt, 'html.parser')

product_new = []
product_ref = []

for product in soup.select('div.product-details'):
    n = product.h2
    r = product.select_one('span.reference-number')

    if n and r:
        product_new.append(n.get_text(strip=True))
        product_ref.append(r.get_text(strip=True).rsplit(maxsplit=1)[-1])

print(product_new)
print(product_ref)

Prints:

['Weekly Roundup', 'Weekly Roundup 2']
['A1400.5', 'A1400.6']

EDIT 3:

for a, b in zip(product_new, product_ref):
    print('{:<30} {}'.format(a, b))

Prints:

Weekly Roundup                 A1400.5
Weekly Roundup 2               A1400.6

Upvotes: 1

datamansahil
datamansahil

Reputation: 412

Try to correct the class name of reference number, use code given below:

product_name_ref = (soup.select_one("span[class*=reference-number]").text)

Upvotes: 0

Related Questions