user4190374
user4190374

Reputation: 49

Getting 'live' Yahoo Finance Data in Python

I've been scraping bloomberg for currency prices using urllib2 and straight forward text functions to read the part of the hmtl where the price is stored. Probably won't win any prizes for efficiency, but it has been suitable for my purposes. This is an extract from the code where the page is scraped.

    #grab the html source as a big string

    response = urllib2.urlopen('https://www.bloomberg.com/quote/CHFGBP:CUR')

    page = response.read()

    #locate the html where price is stored

    pricestart = page.find('meta itemprop="price" content=')+len('meta itemprop="price" content=')

    #plus twenty characters

    price = page[pricestart:pricestart+20] 

    #find the data between double quotes in that substring

    pricesplit = price.split('"')

    #the 1st element of the output array is the price, cast it as a float

    priceAsFloat = float(pricesplit[1])

    #and save it to the current prices dictionary

    pricesCurr[keys] = priceAsFloat

I'd like to do the same thing for Yahoo Finance as it's a lot more frequent in its updates and gives the feeling of 'live' prices (I know they're delayed for 15 minutes).

However, my method that works on the bloomberg html doesn't work for the yahoo source

Looking at this url, for example https://uk.finance.yahoo.com/quote/CHFJPY=X?p=GBPJPY=X

Inspecting the html returned by urllib2.urlopen - the current price isn't there in the text to scrape. Or at least I can't find it!

Can anyone offer any advice as to how to go about scraping the yahoo finance html?

Upvotes: 1

Views: 3280

Answers (1)

userPinealbody
userPinealbody

Reputation: 97

I have also been working with Yahoo finance data. The value you're looking for is there, but it is buried. The following is an excerpt of code I have been using to scrape Yahoo finance:

from bs4 import BeautifulSoup
import urllib3 as url
import certifi as cert


def get_stock_price(name):
    http = url.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=cert.where())
    html_doc = http.request('GET', 'https://finance.yahoo.com/quote/' + name + '?p=' + name)
    soup = BeautifulSoup(html_doc.data, 'html.parser')
    return soup.find("span", class_="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)").get_text()

Where name is the shorthand name of the stock (e.g. 'tsla'). To find the appropriate value to scrape, I manually drilled down through the html until i found the section which highlighted the value I was searching for. The code above works with the site you provided.

Upvotes: 1

Related Questions