Python: Scrape Data from Web after Inputing Info

Question

Could anyone help me revise this Python program to correctly submit information to the "Date Range" query, and then extract the "Close" return data. I am scraping data from the following url:

http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices

And this is my current code, which returns "[ ]".

from lxml import html
import requests


def historic_quotes(symbol, stMonth, stDate, stYear, enMonth, enDate, enYear):
    url = 'https://finance.yahoo.com/q/hp?s=%s+Historical+Prices' % (symbol)

    form_data = {
        'a': stMonth,  #00 is January, 01 is Feb., etc.
        'b': stDate,
        'c': stYear,
        'd': enMonth,  #00 is January, 01 is Feb., etc.
        'e': enDate,
        'f': enYear,
        'submit': 'submit',
    }
response = requests.post(url, data=form_data)

tree = html.document_fromstring(response.content)
p = tree.xpath('//*[@id="yfncsumtab"]/tbody/tr[2]/td[1]/table[4]/tbody/tr/td/table/tbody/tr[2]/td[7]/text()')
print p

historic_quotes('baba',00,11,2010,00,11,2012)

I am an overall Python novice, and greatly appreciate any and all help. Thanks for reading!

Also, I realize now the html source may be of help, but it is huge - so here's an XPATH to it:

//*[@id="daterange"]/table

Expected output is a list of the "Close" Values from the different dates. As previously stated, current output is just "[ ]". I believe something may been incorrect in the form_data, perhaps the "submit".

alecxe · Accepted Answer

The main issue was that you needed to make a GET request, not a POST.

Plus, @Paul Lo is right about the date ranges. For the sake of example, I'm querying from 2010 to 2015.

Also, you have to pass query parameters as strings. 00 evaluated to 0, requests converted int 0 to a "0" string. As a result, instead of 00 for a month, you had 0 sent as a parameter value.

Here is a fixed version with a modified part that gets the amounts:

from lxml import html
import requests

def historic_quotes(symbol, stMonth, stDate, stYear, enMonth, enDate, enYear):
    url = 'https://finance.yahoo.com/q/hp?s=%s+Historical+Prices' % symbol

    params = {
        'a': stMonth,
        'b': stDate,
        'c': stYear,
        'd': enMonth,
        'e': enDate,
        'f': enYear,
        'submit': 'submit',
    }
    response = requests.get(url, params=params)

    tree = html.document_fromstring(response.content)
    for amount in tree.xpath('//table[@class="yfnc_datamodoutline1"]//tr[td[@class="yfnc_tabledata1"]]//td[5]/text()'):
        print amount

historic_quotes('baba', '00', '11', '2010', '00', '11', '2015')

Prints:

Python: Scrape Data from Web after Inputing Info

Answers (2)

Related Questions