Reputation: 175
Given is an unordered list with some list elements that contain the string
"is" - I only want to get these texts:
<ul class="fun-facts">
<li>Owned my dream car in high school <a href="#footer"><sup>1</sup></a></li>
<li>Middle name is Ronald</li>
<li>Never had been on a plane until college</li>
<li>Dunkin Donuts coffee is better than Starbucks</li>
<li>A favorite book series of mine is <i>Ender's Game</i></li>
<li>Current video game of choice is <i>Rocket League</i></li>
<li>The band that I've seen the most times live is the <i>Zac Brown Band</i></li>
</ul>
facts = webpage.select('ul.fun-facts li')
facts_with_is = [fact.find(string=re.compile('is')) for fact in facts]
facts_with_is1 = [fact for fact in facts_with_is if fact]
facts_with_is2 = [fact.find_parent().get_text() for fact in facts_with_is if fact]
facts:
[<li>Owned my dream car in high school <a href="#footer"><sup>1</sup></a></li>, <li>Middle name is Ronald</li>, <li>Never had been on a plane until college</li>, <li>Dunkin Donuts coffee is better than Starbucks</li>, <li>A favorite book series of mine is <i>Ender's Game</i></li>, <li>Current video game of choice is <i>Rocket League</i></li>, <li>The band that I've seen the most times live is the <i>Zac Brown Band</i></li>]
facts_with_is1 (after filter None value of facts_with_is ):
['Middle name is Ronald', 'Dunkin Donuts coffee is better than Starbucks', 'A favorite book series of mine is ', 'Current video game of choice is ', "The band that I've seen the most times live is the "]
facts_with_is2:
['Middle name is Ronald', 'Dunkin Donuts coffee is better than Starbucks', "A favorite book series of mine is Ender's Game", 'Current video game of choice is Rocket League', "The band that I've seen the most times live is the Zac Brown Band"]
How can I get the expected result (fact_with_is2) with a simpler approach?
Upvotes: 0
Views: 51
Reputation: 25048
Select all <li>
and check in a loop if string
is in string
:
from bs4 import BeautifulSoup
html_text='''<ul class="fun-facts">
<li>Owned my dream car in high school <a href="#footer"><sup>1</sup></a></li>
<li>Middle name is Ronald</li>
<li>Never had been on a plane until college</li>
<li>Dunkin Donuts coffee is better than Starbucks</li>
<li>A favorite book series of mine is <i>Ender's Game</i></li>
<li>Current video game of choice is <i>Rocket League</i></li>
<li>The band that I've seen the most times live is the <i>Zac Brown Band</i></li>
</ul>'''
soup= BeautifulSoup (html_text,'lxml')
[x.get_text() for x in soup.select('ul.fun-facts li') if ' is ' in x.get_text()]
['Middle name is Ronald',
'Dunkin Donuts coffee is better than Starbucks',
"A favorite book series of mine is Ender's Game",
'Current video game of choice is Rocket League',
"The band that I've seen the most times live is the Zac Brown Band"]
Upvotes: 1