Reputation: 144
Could anyone help me revise this Python program to correctly submit information to the "Date Range" query, and then extract the "Close" return data. I am scraping data from the following url:
http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices
And this is my current code, which returns "[ ]".
from lxml import html
import requests
def historic_quotes(symbol, stMonth, stDate, stYear, enMonth, enDate, enYear):
url = 'https://finance.yahoo.com/q/hp?s=%s+Historical+Prices' % (symbol)
form_data = {
'a': stMonth, #00 is January, 01 is Feb., etc.
'b': stDate,
'c': stYear,
'd': enMonth, #00 is January, 01 is Feb., etc.
'e': enDate,
'f': enYear,
'submit': 'submit',
}
response = requests.post(url, data=form_data)
tree = html.document_fromstring(response.content)
p = tree.xpath('//*[@id="yfncsumtab"]/tbody/tr[2]/td[1]/table[4]/tbody/tr/td/table/tbody/tr[2]/td[7]/text()')
print p
historic_quotes('baba',00,11,2010,00,11,2012)
I am an overall Python novice, and greatly appreciate any and all help. Thanks for reading!
Also, I realize now the html source may be of help, but it is huge - so here's an XPATH to it:
//*[@id="daterange"]/table
Expected output is a list of the "Close" Values from the different dates. As previously stated, current output is just "[ ]". I believe something may been incorrect in the form_data, perhaps the "submit".
Upvotes: 1
Views: 208
Reputation: 474171
The main issue was that you needed to make a GET
request, not a POST
.
Plus, @Paul Lo is right about the date ranges. For the sake of example, I'm querying from 2010 to 2015.
Also, you have to pass query parameters as strings. 00
evaluated to 0
, requests
converted int 0 to a "0"
string. As a result, instead of 00
for a month, you had 0
sent as a parameter value.
Here is a fixed version with a modified part that gets the amounts:
from lxml import html
import requests
def historic_quotes(symbol, stMonth, stDate, stYear, enMonth, enDate, enYear):
url = 'https://finance.yahoo.com/q/hp?s=%s+Historical+Prices' % symbol
params = {
'a': stMonth,
'b': stDate,
'c': stYear,
'd': enMonth,
'e': enDate,
'f': enYear,
'submit': 'submit',
}
response = requests.get(url, params=params)
tree = html.document_fromstring(response.content)
for amount in tree.xpath('//table[@class="yfnc_datamodoutline1"]//tr[td[@class="yfnc_tabledata1"]]//td[5]/text()'):
print amount
historic_quotes('baba', '00', '11', '2010', '00', '11', '2015')
Prints:
105.95
105.95
105.52
108.77
110.65
109.25
109.02
105.77
104.70
105.11
104.97
103.88
107.48
105.07
107.90
...
90.57
Upvotes: 2
Reputation: 6148
I doubt that Alibaba (BABA) has data during 2010/1/11 to 2012/1/11 since it just IPO recently.
You might need to check the raw data in response.content
first, and try change the range ex: historic_quotes('baba',00,11,2014,00,11,2015)
Upvotes: 1