Reputation: 61
I'm using bs4 trying to get out data from the internet and I have to discard some elements if a condition is reached:
html_code = soup.findAll('tr', class_='class1')
I get:
<tr class=class1> <nobr><a href="link1.html">link</a> </nobr> text </tr>
<tr class=class1> <nobr><a href="link2.html">link</a> </nobr> text </tr>
<tr class=class1> <nobr><a href="link3.html">link</a> condition1 </nobr> text </tr>
<tr class=class1> <nobr><a href="link4.html">link</a> </nobr> text </tr>
I want to eliminate the element with "condition1" and keep the others
<tr class=class1> <nobr><a href="link1.html">link</a> </nobr> text </tr>
<tr class=class1> <nobr><a href="link2.html">link</a> </nobr> text </tr>
<tr class=class1> <nobr><a href="link4.html">link</a> </nobr> text </tr>
what is the best way to do it?
Another question.. is it better scrapy than bs4?
Upvotes: 0
Views: 409
Reputation: 3859
You can filter it like below
html_code = [x for x in soup.findAll('tr', class_='class1') if x.text.find('condition1') == -1]
Upvotes: 1