QuestionSpree
QuestionSpree

Reputation: 33

Find unknown tag containing given text

My HTML is like :

<body>
  <div class="afds">
    <span class="dfsdf">mytext</span>
  </div>
  <div class="sdf dzf">
    <h1>some random text</h1>
  </div>
</body>

I want to find all tags containing "text" & their corresponding classes. In this case, I want:

Next, I want to be able to navigate through the returned tags. For example, find the div parent tag & respective classes of all the returned tags.

If I execute the following

soupx.find_all(text=re.compile(".*text.*"))

it simply returns the text part of the tags:

['mytext', ' some random text']

Please help.

Upvotes: 2

Views: 224

Answers (2)

Jack Fleeting
Jack Fleeting

Reputation: 24930

You are probably looking for something along these lines:

ts = soup.find_all(text=re.compile(".*text.*"))
for t in ts:
    if len(t.parent.attrs)>0:
        for k in t.parent.attrs.keys():
            print(t.parent.name,t.parent.attrs[k][0])
    else:
        print(t.parent.name,"null")

Output:

span dfsdf
h1 null

Upvotes: 2

elmo26
elmo26

Reputation: 150

find_all() does not return just strings, it returns bs4.element.NavigableString. That means you can call other beautifulsoup functions on those results.

Have a look at find_parent and find_parents: documentation

childs = soupx.find_all(text=re.compile(".*text.*"))
for c in childs:
    c.find_parent("div")

Upvotes: 1

Related Questions