Reputation: 65
as the title says, i'm trying to figure out how to find text inside of attributes using BeautifulSoup, lets look at the example below, given the html:
<html>
<head>
</head>
<body>
<input class="form-control" name="searchString" type="text" value="myString"/>
<h2> your string is myString</h2>
</body>
</html>
im trying to find all the tags that has the text "myString" so i tried to do as follows:
soup = BeautifulSoup(doc, "html.parser")
soup.find_all(text=re.compile("myString"))
but unfortunately it returns just one result
[' your string is myString']
ignoring the input that had the string i searched in its value attribute. any suggestions? thanks in advance
is there any generic way to get the tag if i don't know that my string appears in value attribute ? it could appear in any other attribute or even on an onClick event for example, how can i search for my string without knowing where it appears?
for this example i would have to write soup.find_all(onclick=re.compile("myString"))
thanks
Upvotes: 0
Views: 2031
Reputation: 19154
For searching text inside value attribute, change text
to value
results = soup.find_all(value=re.compile("myString"))
for r in results:
# print(r)
print('value: ' + r.get('value'))
note that your string is myString
is not text attribute, it is textContent
or just text
for searching if contain any text or attributes in tags, convert bs4.element
to string or outerHTML
results= soup.find_all(True)
for r in results:
if 'myString' in str(r):
print(r)
# <input class="form-control" name="searchString" type="text" value="myString"/>
# <h2> your string is myString</h2>
And if in any attributes only
# <input class="myString bold" name="searchString" type="text" value="myString"/>
results = soup.find_all(True)
for r in results:
for attr in r.attrs:
attrValue = r[attr]
if 'class' == attr:
attrValue = ' '.join(attrValue)
if 'myString' in attrValue:
print('%s : %s' % (attr, attrValue))
# class : myString bold
# value : myString
Upvotes: 2
Reputation: 186
It finds the tag that contains the item.
Now you need to go through the results and pull the string you want from it.
That is the design of bs--allows you to find something entirely different inside a tag
import bs4
import re
html = """
<html>
<head>
</head>
<body>
<input class="form-control" name="searchString" type="text" value="myString"/>
<h2> your string is myString</h2>
</body>
</html>"""
soup = bs4.BeautifulSoup(html)
results = soup.find_all(text=re.compile("myString"))
print([re.findall("myString", result) for result in results]) # <-- here is where you iterate through the results
results.extend(soup.find_all('input', {"class":"form-control"})) # Useful for divs, etc.
print(results[-1]['value'])
# This second set of results can be subscripted
Hope that helps.
Upvotes: 0