Marco
Marco

Reputation: 245

Obtaining financial data from Google Finance which is outside the scope of the API

Google's finance API is incomplete -- many of the figures on a page such as:

http://www.google.com/finance?fstype=ii&q=NYSE:GE

are not available via the API.

I need this data to rank companies on Canadian stock exchanges according to the formula of Greenblatt, available via google search for "greenblatt index scans".

My question: what is the most intelligent/clean/efficient way of accessing and processing the data on these webpages. Is the tedious approach really necessary in this case, and if so, what is the best way of going about it? I'm currently learning Python for projects related to this one.

Upvotes: 5

Views: 4756

Answers (3)

Ryan Bright
Ryan Bright

Reputation: 3565

You could try asking Google to provide the missing APIs. Otherwise, you're stuck with screen scraping, which is never fun, prone to breaking without notice, and likely in violation of Google's terms of service.

But, if you still want to write a screen scraper, it's hard to beat a combination of mechanize and BeautifulSoup. BeautifulSoup is an HTML parser and mechanize is a Python-based web browser that will let you log in, store cookies, and generally navigate around like any other web browser.

Upvotes: 4

Eli
Eli

Reputation: 5620

BeautifulSoup would be the preferred method of HTML parsing with Python

Have you looked into options besides Google (e.g. Yahoo Finance API)?

Upvotes: 3

Paul Tarjan
Paul Tarjan

Reputation: 50642

Scraping web pages always sucks, but I would recommend converting them to xml (via tidy or some other HTML -> XML program) and then using xpath to walk the nodes that you are interested in.

Upvotes: 0

Related Questions