Reputation: 33
My HTML is like :
<body>
<div class="afds">
<span class="dfsdf">mytext</span>
</div>
<div class="sdf dzf">
<h1>some random text</h1>
</div>
</body>
I want to find all tags containing "text" & their corresponding classes. In this case, I want:
Next, I want to be able to navigate through the returned tags. For example, find the div parent tag & respective classes of all the returned tags.
If I execute the following
soupx.find_all(text=re.compile(".*text.*"))
it simply returns the text part of the tags:
['mytext', ' some random text']
Please help.
Upvotes: 2
Views: 224
Reputation: 24930
You are probably looking for something along these lines:
ts = soup.find_all(text=re.compile(".*text.*"))
for t in ts:
if len(t.parent.attrs)>0:
for k in t.parent.attrs.keys():
print(t.parent.name,t.parent.attrs[k][0])
else:
print(t.parent.name,"null")
Output:
span dfsdf
h1 null
Upvotes: 2
Reputation: 150
find_all() does not return just strings, it returns bs4.element.NavigableString. That means you can call other beautifulsoup functions on those results.
Have a look at find_parent and find_parents: documentation
childs = soupx.find_all(text=re.compile(".*text.*"))
for c in childs:
c.find_parent("div")
Upvotes: 1