disable0
disable0

Reputation: 61

Python - beautifulsoup: removing element inside specific tag

I'm using bs4 trying to get out data from the internet and I have to discard some elements if a condition is reached:

html_code = soup.findAll('tr', class_='class1')

I get:

<tr class=class1> <nobr><a href="link1.html">link</a> </nobr> text </tr>
<tr class=class1> <nobr><a href="link2.html">link</a> </nobr> text </tr>
<tr class=class1> <nobr><a href="link3.html">link</a> condition1 </nobr> text </tr>
<tr class=class1> <nobr><a href="link4.html">link</a> </nobr> text </tr>

I want to eliminate the element with "condition1" and keep the others

<tr class=class1> <nobr><a href="link1.html">link</a> </nobr> text </tr>
<tr class=class1> <nobr><a href="link2.html">link</a> </nobr> text </tr>
<tr class=class1> <nobr><a href="link4.html">link</a> </nobr> text </tr>

what is the best way to do it?

Another question.. is it better scrapy than bs4?

Upvotes: 0

Views: 409

Answers (1)

gipsy
gipsy

Reputation: 3859

You can filter it like below

html_code = [x for x in soup.findAll('tr', class_='class1') if x.text.find('condition1') == -1]

Upvotes: 1

Related Questions