get data from MarketWatch

Question

Using BeautifulSoup I am trying to scrape MarketWatch

from bs4 import BeautifulSoup
import requests
import pandas


url = "https://www.marketwatch.com/investing/stock/khc/profile"
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text

soup = BeautifulSoup(html_content, "lxml")

Now I would like to extract the "P/E Current" and the "Price to Sales Ratio" that in the html are in the class

[...]


    

        Valuation    
                    
            P/E Current    
            19.27    
            
                    
            P/E Ratio (with extraordinary items)    
            19.55    
            
                
            P/E Ratio (without extraordinary items)
            20.00
        
                
            Price to Sales Ratio
            1.55
        
                
            Price to Book Ratio
            0.75
        
        [...]

How can I get them?

I use the command

section = soup.findAll('div', {'class' : 'section'})

but then I don't know how to go ahead to get the values I am interested in, can you help?

0buz · Accepted Answer

This solution will get data from all sections under div with class "sixwide addgutter". The result is in the form of list of dictionaries:

soup = BeautifulSoup(req.content, 'lxml')

base = soup.find('div', attrs={'class' : 'sixwide addgutter'})
section = base.find_all('div', attrs={'class' : 'section'})

all_data=[]
for item in section:
    data = {}
    data['name']=item.p.text
    data['value']=item.p.findNext('p').text
    all_data.append(data)

Output sample:

If you want the specifics, "P/E Current" and "Price to Sales Ratio" are all_data[0], all_data[3] respectively.

get data from MarketWatch

Answers (2)

Related Questions