MJS
MJS

Reputation: 1623

Pandas DataReader work-around for Google finance

I pull historical data for a large universe of stocks and ETFs daily. Quandl has pretty good free coverage of US Equities, but they do not have historical data for ETFs so I use the Google API as a backup for Quandl.

The recent Google finance "renovation" hasn't left me with a great alternative, so I am trying to apply Brad Solomon's work (thanks Brad, link below) to a list of symbols. Assume it is unlikely without a loop given that he is creating URLs. Any clever ideas welcome.

Related question: How come pandas_datareader for google doesn't work?

Thanks.

Upvotes: 1

Views: 1176

Answers (1)

Brad Solomon
Brad Solomon

Reputation: 40948

Under the hood, pandas-datareader is looping through each symbol that you pass and making http requests one by one.

Here's the function that does that in the base class, from which the google- and yahoo-related classes inherit: base._DailyBaseReader._dl_mult_symbols.

The magic is that these are appended to a list and then aggregated into a pandas Panel.

I would note, however, that Panel is deprecated and you can get the same functionality in a DataFrame with a MultiIndex, a structure that's technically 2-dimenionsal but replicates higher dimensionalities in practice.

So, here's the barebones of what you could do, below. Please note I'm skipping a lot of the functionality embedded within the package itself, such as parsing string dates to datetime.

import datetime
from io import StringIO

import requests
from pandas.io.common import urlencode
import pandas as pd

BASE = 'http://finance.google.com/finance/historical'


def get_params(sym, start, end):
    params = {
        'q': sym,
        'startdate': start.strftime('%Y/%m/%d'),
        'enddate': end.strftime('%Y/%m/%d'),
        'output': "csv"
    }
    return params


def build_url(sym, start, end):
    params = get_params(sym, start, end)
    return BASE + '?' + urlencode(params)


def get_one_data(sym, start=None, end=None):
    if not start:
        start = datetime.datetime(2010, 1, 1)
    if not end:
        end = datetime.datetime.today()
    url = build_url(sym, start, end)
    data = requests.get(url).text
    return pd.read_csv(StringIO(data), index_col='Date',
                       parse_dates=True).sort_index()


def get_multiple(sym, start=None, end=None, return_type='Panel'):
    if isinstance(sym, str):
        return get_one_data(sym, start=start, end=end)
    elif isinstance(sym, (list, tuple, set)):
        res = {}
        for s in sym:
            res[s] = get_one_data(s, start, end)
        # The actual module also implements a 'passed' and 'failed'
        #     check here and also using chunking to get around
        #     data retreival limits (I believe)

    if return_type.lower() == 'panel':
        return pd.Panel(res).swapaxes('items', 'minor')
    elif return_type.lower() == 'mi':  # MultiIndex DataFrame
        return pd.concat((res), axis=1)

An example:

syms = ['AAPL', 'GE']
data = get_multiple(syms, return_type='mi')

# Here's how you would filter down to Close prices
#   on MultiIndex columns
data.xs('Close', axis=1, level=1) 

              AAPL     GE
Date                     
2010-01-04   30.57  15.45
2010-01-05   30.63  15.53
2010-01-06   30.14  15.45
2010-01-07   30.08  16.25
2010-01-08   30.28  16.60
...

Upvotes: 1

Related Questions