Reputation: 3425
I'm using Beautiful Soup 4 to scrape a page. There's a block of text I don't want:
<p class="MsoNormal" style="text-align: center"><b>
<span lang="EN-US" style="font-family: Arial; color: blue">
<font size="4">1 </font></span>
<span lang="AR-SA" dir="RTL" style="font-family: Arial; color: blue">
<font size="4">ـ</font></span><span lang="EN-US" style="font-family: Arial; color: blue"><font size="4">
сүрә фатиһә</font></span></b></p>
The thing that makes it unique is that it has a tag. I already used findall() to get all the
tags. So now I have a for loop like:
for el in doc.findall('p'):
if el.hasChildTag('b'):
break;
Unfortunately bs4 has no "hasChildTag" function
Upvotes: 2
Views: 3067
Reputation: 80346
for elem in soup.findAll('p'):
if elem.findChildren('b'):
continue #skip the elem with "b", and continue with the loop
#do stuff with the elem
Upvotes: 2
Reputation: 3059
It should be possible to use css selectors also.
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors
soup.select("p b")
Upvotes: 3