IlanL
IlanL

Reputation: 65

beautifulsoup finding text inside of attributes

as the title says, i'm trying to figure out how to find text inside of attributes using BeautifulSoup, lets look at the example below, given the html:

<html>
   <head>
   </head>
   <body>
      <input class="form-control" name="searchString" type="text" value="myString"/>
      <h2> your string is myString</h2>
   </body>
</html>

im trying to find all the tags that has the text "myString" so i tried to do as follows:

soup = BeautifulSoup(doc, "html.parser")
soup.find_all(text=re.compile("myString"))

but unfortunately it returns just one result

[' your string is myString']

ignoring the input that had the string i searched in its value attribute. any suggestions? thanks in advance

is there any generic way to get the tag if i don't know that my string appears in value attribute ? it could appear in any other attribute or even on an onClick event for example, how can i search for my string without knowing where it appears? for this example i would have to write soup.find_all(onclick=re.compile("myString"))

thanks

Upvotes: 0

Views: 2031

Answers (2)

ewwink
ewwink

Reputation: 19154

For searching text inside value attribute, change text to value

results = soup.find_all(value=re.compile("myString"))
for r in results:
   # print(r)
   print('value: ' + r.get('value'))

note that your string is myString is not text attribute, it is textContent or just text

for searching if contain any text or attributes in tags, convert bs4.element to string or outerHTML

results= soup.find_all(True)
for r in results:
   if 'myString' in str(r):
       print(r)
       # <input class="form-control" name="searchString" type="text" value="myString"/>
       # <h2> your string is myString</h2>

And if in any attributes only

# <input class="myString bold" name="searchString" type="text" value="myString"/> 

results = soup.find_all(True)
for r in results:
    for attr in r.attrs:
        attrValue = r[attr]
        if 'class' == attr:
            attrValue = ' '.join(attrValue)
        if 'myString' in attrValue:
            print('%s : %s' % (attr, attrValue))
            # class : myString bold
            # value : myString

Upvotes: 2

JBB
JBB

Reputation: 186

It finds the tag that contains the item.

Now you need to go through the results and pull the string you want from it.

That is the design of bs--allows you to find something entirely different inside a tag

import bs4
import re
html = """
<html>
   <head>
   </head>
   <body>
      <input class="form-control" name="searchString" type="text" value="myString"/>
      <h2> your string is myString</h2>
   </body>
</html>"""
soup = bs4.BeautifulSoup(html)
results = soup.find_all(text=re.compile("myString"))
print([re.findall("myString", result) for result in results])  # <-- here is where you iterate through the results

results.extend(soup.find_all('input', {"class":"form-control"}))  # Useful for divs, etc.
print(results[-1]['value'])
# This second set of results can be subscripted

Hope that helps.

Upvotes: 0

Related Questions