Reputation: 391
I am scraping data through google finance's historical page for a stock (http://www.google.com/finance/historical?q=NSE%3ASIEMENS&ei=PLfUVIDTDuSRiQKhwYGQBQ).
I can scrape the 30 rows on the current page. The issue I am facing is that I am unable to scrape through the rest of data in the table (31-241 rows). How do I go to the next page or link. Following is my code:
import urllib2
import xlwt #to write into excel spreadsheet
from bs4 import BeautifulSoup
# Main Coding Section
stock_links = open('stock_link_list.txt', 'r') #opening text file for reading
#url="https://www.google.com/finance/historical?q=NSE%3ASIEMENS&ei=zHXOVLPnApG2iALxxYCADQ"
for url in stock_links:
OurFile = urllib2.urlopen(url)
OurHtml = OurFile.read()
OurFile.close()
soup = BeautifulSoup(OurHtml)
#soup1 = soup.find("div", {"class": "gf-table-wrapper sfe-break-bottom-16"}).get_text()
soup1 = soup.find("table", {"class": "gf-table historical_price"}).get_text()
end = url.index('&')
filename = url[47:end]
file = open(filename, 'w') #opening text file for writing
file.write(soup1)
#file.write(soup1.get_text()) #writing to the text file
file.close() #closing the text file
Upvotes: 1
Views: 2516
Reputation: 21201
Looking at first sight the Row Limit
option allows to shows maximum 30 row per page but I manually changed query string parameters to greater numbers and realizes we can view max 200 rows per page
Change URL to
https://www.google.com/finance/historical?q=NSE%3ASIEMENS&ei=OM3UVLFtkLnzBsjIgYAI&start=0&num=200
It will show 200 rows
And then change start=200&num=400
But more logically, if you have many other sunch kind of links.
Then you can scrape the Pagination area, the last TR and grab those links of next pages and scrape
Upvotes: 0
Reputation: 180391
You will have to fine tune it and I would catch more specific errors but you can keep increasing the start
to get the next data:
url = "https://www.google.com/finance/historical?q=NSE%3ASIEMENS&ei=W8LUVLHnAoOswAOFs4DACg&start={}&num=30"
from bs4 import BeautifulSoup
import requests
# Main Coding Sectio
start = 0
while True:
try:
nxt = url.format(start)
r = requests.get(nxt)
soup = BeautifulSoup(r.content)
print(soup.find("table",{"class": "gf-table historical_price"}).get_text())
except Exception as e:
print(e)
break
start += 30
This gets all the table data up to the last date feb 7 :
......
Date
Open
High
Low
Close
Volume
Feb 7, 2014
552.60
557.90
548.25
551.50
119,711
Upvotes: 2