Reputation: 1
Let's use the word technology for my example. I want to search all text on a webpage. For each text, I want to find each element tags containing a string with the word "technology" and print only the contents of the element tag containing the word. Please help me figure this out.
words = soup.body.get_text()
for word in words:
i = word.soup.find_all("technology")
print(i)
Upvotes: 0
Views: 2647
Reputation: 473903
You should use the search by text which can be accomplished by using the text
argument (which was renamed to string
in the modern BeautifulSoup
versions), either via a function and substring in a string check:
for element in soup.find_all(text=lambda text: text and "technology" in text):
print(element.get_text())
Or, via a regular expression pattern:
import re
for element in soup.find_all(text=re.compile("technology")):
print(element.get_text())
Upvotes: 2
Reputation: 575
Since you are looking for data inside of an 'HTML structure' and not a typical data
structure, you are going to have to nearly write an HTML parser for this job. Python doesn't normally know that "some string here" relates to another string wrapped in brackets somewhere else.
There may be a library for this, but I have a feeling that there isn't :(
Upvotes: 0