Reputation: 4291
I have the following html code:
soup = BeautifulSoup("<div class='mydiv'><p style='xyz'>123</p><p>456</p><p style='xyz'>789</p><p>abc</p></div>", 'lxml')
Now, I have a text '456'.
I want to find the text in the all the tags which have the same tag name containing the text '456'.
That is, in the html, <p>456</p>
contains 456
, then we should find abc
because of <p>abc</p>
but not 123
and 789
because <p style>
in <p style='xyz'>123</p>
and <p style='xyz'>789</p>
.
Note that <p>
above can be other tag, such as <div>
.
Searching soup.find('p')
should be avoided.
The final result is [456, 789]
.
It is a bit complicated.
How can we solve this problem?
Thanks.
Upvotes: 1
Views: 1404
Reputation: 195428
This script will print all tags that share tag name and tag attributes with tag that contains string "456":
txt = '''
<div class='mydiv'>
<p style='xyz'>123</p>
<p>456</p>
<p style='xyz'>789</p>
<p>abc</p>
</div>'''
text_to_find = '456'
soup = BeautifulSoup(txt, 'html.parser')
tmp = soup.find(lambda t: t.contents and t.contents[0] == text_to_find)
if tmp:
for tag in soup.find_all(lambda t: t.name == tmp.name and t.attrs == tmp.attrs):
print(tag)
Prints:
<p>456</p>
<p>abc</p>
For input "123":
<p style="xyz">123</p>
<p style="xyz">789</p>
Upvotes: 1
Reputation: 910
Try:
soup = BeautifulSoup("<div class='mydiv'><p style='xyz'>123</p><p>456</p><p style='xyz'>789</p><p>abc</p></div>", 'html5lib')
tags = soup.find_all()
for tag in tags:
if tag.get('style'):
tag.extract()
for tag in soup.select('html body'):
print(tag.get_text('\n'))
prints:
456
abc
Upvotes: 0
Reputation: 582
Theres actually multiple ways, here are two examples how you could find what you are looking for:
from bs4 import BeautifulSoup
soup = BeautifulSoup("<div class='mydiv'><p style='xyz'>123</p><p>456</p><p style='xyz'>789</p><p>abc</p></div>", 'lxml')
# Find all tags first and then look for the one matching your string
found = [x for x in soup.findAll() if x.text == "456"]
for p in found:
print(p)
# Using findAll functionality directly
found = soup.findAll(text="456")
for p in found:
print(p)
<p>456</p>
456
Note however, using the second method you recieve NavigableString
objects and not Tag
objects!
Upvotes: 0