Ommitting specific text using BeautifulSoup

Question

Using BeautifulSoup I'm attempting to extract some very specific text from a website using a custom lambda function. I'm struggling to pick out exactly what I need while leaving the stuff out I don't need.


      
            
                
Barron's                    
                
                    
                        
                        
                        More Bad Times Ahead for These 6 Big Tech Stocks
                    
            
        

        
        
            May. 10, 2022 at 11:39 a.m. ET

I'm looking to extract just the news headline - in this case it's "More Bad Times Ahead for These 6 Big Tech Stocks" and leave behind the annoying heading "Barron".

So far my function looks like:

for txt in soup.find_all(lambda tag: tag.name == 'h3' and tag.get('class') == ['article__headline']):
     print(txt.text)

I've attempted tag.name = "a" and tag.get('class') == ['link'] but that returns a load of other stuff I don't need from the webpage...

Andrej Kesely · Accepted Answer

Try CSS selector h3 a (select all tags which are inside

tag):

for title in soup.select("h3 a"):
    print(title.text.strip())

Prints:

More Bad Times Ahead for These 6 Big Tech Stocks

If you want to be more specific:

for title in soup.select("h3.article__headline a"):
    print(title.text.strip())

Ommitting specific text using BeautifulSoup

Answers (1)

Related Questions