Find Text of Sibling Element, Where Original Element Matches Specific String

Question

I want to scrape some data prices out of a bunch of html tables. The tables contain all sorts of prices, and of course the table data tags don't contain anything useful.


  
    
      
        Normal Price:
        $100.00
      
      
        Member Price:
        $90.00
      
      
        Sale Price:
        $80.00
      
      
        You save:
        $20.00

The only prices that I care about are those that are paired with an element that has "Normal Price" as it's text.

What I'd like to be able to do is scan the table's descendants, find the tag that has that text, then pull the text from it's sibling.

The problem I'm having is that in BeautifulSoup the descendants attribute returns a list of NavigableString, not Tag.

So if I do this:

from bs4 import BeautifulSoup
from urllib import request

html = request.urlopen(url)
soup = BeautifulSoup(html, 'lxml')

div = soup.find('div', {'id': 'item-price-data'})
table_data = div.find_all('td')

for element in table_data:
    if element.get_text() == 'Normal Price:':
        price = element.next_sibling

print(price)

I get nothing. Is there an easy way to get the string value back?

Sede · Accepted Answer

You can use the find_next() method also you may need a bit of regex:

Demo:

>>> import re
>>> from bs4 import BeautifulSoup
>>> html = """
...   
...     
...       
...         
...         
...       
...       
...         
...         
...       
...       
...         
...         
...       
...       
...         
...         
...       
...     
...   Normal Price: $100.00
Member Price: $90.00
Sale Price: $80.00
You save: $20.00
... """
>>> soup = BeautifulSoup(html, 'lxml')
>>> div = soup.find('div', {'id': 'item-price-data'})
>>> for element in div.find_all('td', text=re.compile('Normal Price')):
...     price = element.find_next('td')
...     print(price)
... 
$100.00

If you don't want to bring regex into this then the following will work for you.

>>> table_data = div.find_all('td')
>>> for element in table_data:
...     if 'Normal Price' in element.get_text():
...         price = element.find_next('td')
...         print(price)
... 
$100.00

Find Text of Sibling Element, Where Original Element Matches Specific String

Answers (1)

Related Questions