Reputation: 11696
I'm new to using beautifulsoup and scraping in general so I'm trying to get my feet wet so to speak.
I'd like to get the first row of information for the Dow Jones Industrial Average from here: http://www.google.com/finance/historical?q=INDEXDJX%3A.DJI&ei=ZN_2UqD9NOTt6wHYrAE
While I can read the data and print(soup) outputs everything, I can't seem to get down far enough. How would I select the rows that I save into table? How about the first rows?
Thank you so much for your help!
import urllib.parse
import urllib.request
from bs4 import BeautifulSoup
import json
import sys
import os
import time
import csv
import errno
DJIA_URL = "http://www.google.com/finance/historical?q=INDEXDJX%3A.DJI&ei=ZN_2UqD9NOTt6wHYrAE"
def downloadData(queryString):
with urllib.request.urlopen(queryString) as url:
encoding = url.headers.get_content_charset()
result = url.read().decode(encoding)
return result
raw_html = downloadData(DJIA_URL)
soup = BeautifulSoup(raw_html)
#print(soup)
table = soup.findAll("table", {"class":"gf-table historical_price"})
Upvotes: 0
Views: 1002
Reputation: 1124658
You want the second tr
table row then:
prices = soup.find('table', class_='historical_price')
rows = prices.find_all('tr')
print rows[1]
or, to list all rows with prices info, skip the one with any th
elements:
for row in rows:
if row.th: continue
or use that first header as a source for dictionary keys:
keys = [th.text.strip() for th in rows[0].find_all('th')]
for row in rows[1:]:
data = {key: td.text.strip() for key, td in zip(keys, row.find_all('td'))}
print data
which produces:
{u'Volume': u'105,782,495', u'High': u'15,798.51', u'Low': u'15,625.53', u'Date': u'Feb 7, 2014', u'Close': u'15,794.08', u'Open': u'15,630.64'}
{u'Volume': u'106,979,691', u'High': u'15,632.09', u'Low': u'15,443.00', u'Date': u'Feb 6, 2014', u'Close': u'15,628.53', u'Open': u'15,443.83'}
{u'Volume': u'105,125,894', u'High': u'15,478.21', u'Low': u'15,340.69', u'Date': u'Feb 5, 2014', u'Close': u'15,440.23', u'Open': u'15,443.00'}
{u'Volume': u'124,106,548', u'High': u'15,481.85', u'Low': u'15,356.62', u'Date': u'Feb 4, 2014', u'Close': u'15,445.24', u'Open': u'15,372.93'}
etc.
Upvotes: 2