Scraping data through paginated table using python

Question

I am scraping data through google finance's historical page for a stock (http://www.google.com/finance/historical?q=NSE%3ASIEMENS&ei=PLfUVIDTDuSRiQKhwYGQBQ).

I can scrape the 30 rows on the current page. The issue I am facing is that I am unable to scrape through the rest of data in the table (31-241 rows). How do I go to the next page or link. Following is my code:

import urllib2
import xlwt #to write into excel spreadsheet
from bs4 import BeautifulSoup

# Main Coding Section

stock_links = open('stock_link_list.txt', 'r')  #opening text file for reading

#url="https://www.google.com/finance/historical?q=NSE%3ASIEMENS&ei=zHXOVLPnApG2iALxxYCADQ"
for url in stock_links:
    OurFile = urllib2.urlopen(url)
    OurHtml = OurFile.read()
    OurFile.close()
soup = BeautifulSoup(OurHtml)
#soup1 = soup.find("div", {"class": "gf-table-wrapper sfe-break-bottom-16"}).get_text()
soup1 = soup.find("table", {"class": "gf-table historical_price"}).get_text()

end = url.index('&')
filename = url[47:end]
file = open(filename, 'w')  #opening text file for writing
file.write(soup1)
#file.write(soup1.get_text())   #writing to the text file
file.close()            #closing the text file

Padraic Cunningham · Accepted Answer

You will have to fine tune it and I would catch more specific errors but you can keep increasing the start to get the next data:

url = "https://www.google.com/finance/historical?q=NSE%3ASIEMENS&ei=W8LUVLHnAoOswAOFs4DACg&start={}&num=30"

from bs4 import BeautifulSoup
import  requests
# Main Coding Sectio
start = 0
while True:
    try:
        nxt = url.format(start)
        r = requests.get(nxt)
        soup = BeautifulSoup(r.content)
        print(soup.find("table",{"class": "gf-table historical_price"}).get_text())
    except Exception as e:
        print(e)
        break
    start += 30

This gets all the table data up to the last date feb 7 :

......

Date
Open
High
Low
Close
Volume

Feb 7, 2014
552.60
557.90
548.25
551.50
119,711

Scraping data through paginated table using python

Answers (2)

Related Questions