BeautifulSoup not extracting specific tag text

Question

I'm having a problem harvesting the information for a specific tag using BeautifulSoup. I would like to extract the text for 'Item 4' between the tag html, but the code below gets the text related to 'Item 1.' What am I doing incorrect(e.g., slicing)?

Code:

primary_detail = page_section.findAll('div', {'class': 'detail-item'})
for item_4 in page_section.find('h3', string='Item 4'):
  if item_4:
    for item_4_content in page_section.find('html'):
      print (item_4_content)

HTML:


   Item 1
   Item 1 text here



   Item 2
   Item 2 text here



   Item 3
   Item 3 text here



   Item 4
   Item 4 text here

dot.Py · Accepted Answer

It looks like you want to print the

tag content according to

text value, correct?

Your code must:

load a html_source
search for all 'div' tags that contains a 'class' equal to 'detail-item'
for each occurrence, if the .text value of
tag is equal to the string 'Item 4'
then the code will print the .text value of the corresponding
tag

You can achieve what you want by using the following code.

Code:

s = '''
   Item 1
   Item 1 text here



   Item 2
   Item 2 text here



   Item 3
   Item 3 text here



   Item 4
   Item 4 text here
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(s, 'lxml')

primary_detail = soup.find_all('div', {'class': 'detail-item'})

for tag in primary_detail:
    if 'Item 4' in tag.h3.text:
        print(tag.p.text)

Output:

'Item 4 text here'

EDIT: In the provided website the first loop occurence don't have a

tag, only a

so it won't have any .text value, correct?

You can bypass this error using a try/except clause, like in the following code..

Code:

from bs4 import BeautifulSoup
import requests


url = 'https://fortiguard.com/psirt/FG-IR-17-097'
html_source = requests.get(url).text

soup = BeautifulSoup(html_source, 'lxml')

primary_detail = soup.find_all('div', {'class': 'detail-item'})

for tag in primary_detail:
    try:
        if 'Solutions' in tag.h3.text:
            print(tag.p.text)
    except:
        continue

If the code faces an exception, it'll continue the iteration with the next element in the loop. So the code will ignore the first item that don't contain any .text value.

Output:

'Upgrade to FortiWLC-SD version 8.3.0'

BeautifulSoup not extracting specific tag text

Answers (1)

Related Questions