Reputation: 49
I've been scraping bloomberg for currency prices using urllib2 and straight forward text functions to read the part of the hmtl where the price is stored. Probably won't win any prizes for efficiency, but it has been suitable for my purposes. This is an extract from the code where the page is scraped.
#grab the html source as a big string
response = urllib2.urlopen('https://www.bloomberg.com/quote/CHFGBP:CUR')
page = response.read()
#locate the html where price is stored
pricestart = page.find('meta itemprop="price" content=')+len('meta itemprop="price" content=')
#plus twenty characters
price = page[pricestart:pricestart+20]
#find the data between double quotes in that substring
pricesplit = price.split('"')
#the 1st element of the output array is the price, cast it as a float
priceAsFloat = float(pricesplit[1])
#and save it to the current prices dictionary
pricesCurr[keys] = priceAsFloat
I'd like to do the same thing for Yahoo Finance as it's a lot more frequent in its updates and gives the feeling of 'live' prices (I know they're delayed for 15 minutes).
However, my method that works on the bloomberg html doesn't work for the yahoo source
Looking at this url, for example https://uk.finance.yahoo.com/quote/CHFJPY=X?p=GBPJPY=X
Inspecting the html returned by urllib2.urlopen - the current price isn't there in the text to scrape. Or at least I can't find it!
Can anyone offer any advice as to how to go about scraping the yahoo finance html?
Upvotes: 1
Views: 3280
Reputation: 97
I have also been working with Yahoo finance data. The value you're looking for is there, but it is buried. The following is an excerpt of code I have been using to scrape Yahoo finance:
from bs4 import BeautifulSoup
import urllib3 as url
import certifi as cert
def get_stock_price(name):
http = url.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=cert.where())
html_doc = http.request('GET', 'https://finance.yahoo.com/quote/' + name + '?p=' + name)
soup = BeautifulSoup(html_doc.data, 'html.parser')
return soup.find("span", class_="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)").get_text()
Where name
is the shorthand name of the stock (e.g. 'tsla'). To find the appropriate value to scrape, I manually drilled down through the html until i found the section which highlighted the value I was searching for. The code above works with the site you provided.
Upvotes: 1