Find Next Siblings not returning a value. How can I extract the two classes I need without the rest of the classes?

Question

I would like to extract just the item weight and the product dimensions from "content" below. What am I missing here? In my script, the content that I am looking for is not found. Is there a simpler way to just extract item weight and product dimensions? Thanks

import bs4 as bs

content = '''

Item Weight


0.16 ounces




Product Dimensions


4.8 x 3.4 x 0.5 inches




Batteries Included?


No




Batteries Required?


No


'''
soup = bs.BeautifulSoup(content, features='lxml')


try:
    product = {
        'weight': soup.find(text='Item Weight').parent.find_next_siblings(),
        'dimension': soup.find(text='Product Dimensions').parent.find_next_siblings()
    }
except:
    product = {
        'weight': 'item unavailable',
        'dimension': 'item unavailable'
    }
print(product)

Traceback:

{'weight': 'item unavailable', 'dimension': 'item unavailable'}

Rustam Garayev · Accepted Answer

First of all, if you want to find immediate next sibling, you need to use .find_next_sibling() instead of .find_next_siblings(). Then the reason why you are not getting any output is the representation of text inside tags. If you do:

print([each_th.text for each_th in soup.find_all('th')])

You will see that the result would look like this:

['
Item Weight
', '
Product Dimensions
', '
Batteries Included?
', '
Batteries Required?
']

So, you need to change text='Item Weight' to text=' Item Weight ' and so on:

try:
    product = {
        'weight': soup.find(text='
Item Weight
').parent.find_next_sibling().text,
        'dimension': soup.find(text='
Product Dimensions
').parent.find_next_sibling().text
    }
except:
    product = {
        'weight': 'item unavailable',
        'dimension': 'item unavailable'
    }

This will give:

{'weight': '
0.16 ounces
', 'dimension': '
4.8 x 3.4 x 0.5 inches
'}

Now if you want to remove those newline characters, you can use either .replace(' ', '') or .strip() to do it when grabbing it.

Find Next Siblings not returning a value. How can I extract the two classes I need without the rest of the classes?

Answers (2)

Related Questions