user1357015
user1357015

Reputation: 11696

python beautifulsoup4 parsing google finance data

I'm new to using beautifulsoup and scraping in general so I'm trying to get my feet wet so to speak.

I'd like to get the first row of information for the Dow Jones Industrial Average from here: http://www.google.com/finance/historical?q=INDEXDJX%3A.DJI&ei=ZN_2UqD9NOTt6wHYrAE

While I can read the data and print(soup) outputs everything, I can't seem to get down far enough. How would I select the rows that I save into table? How about the first rows?

Thank you so much for your help!

import urllib.parse
import urllib.request
from bs4 import BeautifulSoup
import json
import sys
import os
import time
import csv
import errno

DJIA_URL = "http://www.google.com/finance/historical?q=INDEXDJX%3A.DJI&ei=ZN_2UqD9NOTt6wHYrAE"

def downloadData(queryString):
    with urllib.request.urlopen(queryString) as url:
        encoding = url.headers.get_content_charset()
        result = url.read().decode(encoding)
    return result

raw_html = downloadData(DJIA_URL)
soup = BeautifulSoup(raw_html)

#print(soup)

table = soup.findAll("table", {"class":"gf-table historical_price"})

Upvotes: 0

Views: 1002

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124658

You want the second tr table row then:

prices = soup.find('table', class_='historical_price')
rows = prices.find_all('tr')
print rows[1]

or, to list all rows with prices info, skip the one with any th elements:

for row in rows:
    if row.th: continue

or use that first header as a source for dictionary keys:

keys = [th.text.strip() for th in rows[0].find_all('th')]
for row in rows[1:]:
    data = {key: td.text.strip() for key, td in zip(keys, row.find_all('td'))}
    print data

which produces:

{u'Volume': u'105,782,495', u'High': u'15,798.51', u'Low': u'15,625.53', u'Date': u'Feb 7, 2014', u'Close': u'15,794.08', u'Open': u'15,630.64'}
{u'Volume': u'106,979,691', u'High': u'15,632.09', u'Low': u'15,443.00', u'Date': u'Feb 6, 2014', u'Close': u'15,628.53', u'Open': u'15,443.83'}
{u'Volume': u'105,125,894', u'High': u'15,478.21', u'Low': u'15,340.69', u'Date': u'Feb 5, 2014', u'Close': u'15,440.23', u'Open': u'15,443.00'}
{u'Volume': u'124,106,548', u'High': u'15,481.85', u'Low': u'15,356.62', u'Date': u'Feb 4, 2014', u'Close': u'15,445.24', u'Open': u'15,372.93'}

etc.

Upvotes: 2

Related Questions