Reputation: 15629
I'm having a problem harvesting the information for a specific tag using BeautifulSoup. I would like to extract the text for 'Item 4' between the tag html, but the code below gets the text related to 'Item 1.' What am I doing incorrect(e.g., slicing)?
Code:
primary_detail = page_section.findAll('div', {'class': 'detail-item'})
for item_4 in page_section.find('h3', string='Item 4'):
if item_4:
for item_4_content in page_section.find('html'):
print (item_4_content)
HTML:
<div class="detail-item">
<h3>Item 1</h3>
<html><body><p>Item 1 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 2</h3>
<html><body><p>Item 2 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 3</h3>
<html><body><p>Item 3 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 4</h3>
<html><body><p>Item 4 text here</p></body></html>
</div>
Upvotes: 2
Views: 2217
Reputation: 5157
It looks like you want to print the <p>
tag content according to <h3>
text value, correct?
Your code must:
html_source
'div'
tags that contains a 'class'
equal to 'detail-item'
.text
value of <h3>
tag is equal to the string 'Item 4'
print
the .text
value of the corresponding <p>
tagYou can achieve what you want by using the following code.
Code:
s = '''<div class="detail-item">
<h3>Item 1</h3>
<html><body><p>Item 1 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 2</h3>
<html><body><p>Item 2 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 3</h3>
<html><body><p>Item 3 text here</p></body></html>
</div>
<div class="detail-item">
<h3>Item 4</h3>
<html><body><p>Item 4 text here</p></body></html>
</div>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(s, 'lxml')
primary_detail = soup.find_all('div', {'class': 'detail-item'})
for tag in primary_detail:
if 'Item 4' in tag.h3.text:
print(tag.p.text)
Output:
'Item 4 text here'
EDIT: In the provided website the first loop occurence don't have a <h3>
tag, only a <h2>
so it won't have any .text
value, correct?
You can bypass this error using a try/except
clause, like in the following code..
Code:
from bs4 import BeautifulSoup
import requests
url = 'https://fortiguard.com/psirt/FG-IR-17-097'
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, 'lxml')
primary_detail = soup.find_all('div', {'class': 'detail-item'})
for tag in primary_detail:
try:
if 'Solutions' in tag.h3.text:
print(tag.p.text)
except:
continue
If the code faces an exception, it'll continue the iteration with the next element in the loop. So the code will ignore the first item that don't contain any .text
value.
Output:
'Upgrade to FortiWLC-SD version 8.3.0'
Upvotes: 3