Scilear
Scilear

Reputation: 238

Python pandas datareader no longer works for yahoo-finance changed url

Since yahoo discontinued their API support pandas datareader now fails

import pandas_datareader.data as web
import datetime
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime(2017, 5, 17)
web.DataReader('GOOGL', 'yahoo', start, end)

HTTPError: HTTP Error 401: Unauthorized

is there any unofficial library allowing us to temporarily work around the problem? Anything on Quandl maybe?

Upvotes: 16

Views: 56662

Answers (11)

JeeyCi
JeeyCi

Reputation: 597

no loop needed using pdr for yfinance-source

from pandas_datareader import data as pdr
import yfinance as yf
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt

yf.pdr_override()
##Symbol    Name
##^IRX    13 WEEK TREASURY BILL
##^FVX    Treasury Yield 5 Years
##^TNX    CBOE Interest Rate 10 Year T No
##^TYX    Treasury Yield 30 Years
symbols = ['^TYX', '^TNX', '^FVX', '^IRX',  'DX=F', '6E=F', '6J=F', 'ES=F', 'GC=F', 'CL=F']
df = pdr.get_data_yahoo(symbols, start= '2015-01-01', end= dt.datetime.today())['Adj Close']
print(df)

# plot the current top ten
df.plot(subplots=True, layout=(5, 2), figsize=(12,12), sharex=False,  ylabel='%', title='Current Assets')
plt.show()

for fred-source

import pandas_datareader as pdr
import matplotlib.pyplot as plt
import datetime as dt
print(pdr.__version__)

f1 = 'TEDRATE' # ted spread (LT ind.)
f2 = 'T10Y2Y' # constant maturity ten yer - 2 year
f3 = 'T10Y3M' # constant maturity 10yr - 3m
## ('GS10') 10-year constant maturity yields on U.S. government bonds
f4 = 'LIOR3M'
f5 = 'AMERIBOR'     # Overnight Unsecured AMERIBOR Benchmark Interest Rate
# https://fredblog.stlouisfed.org/2022/03/interest-rates-on-secured-and-unsecured-overnight-lending/?utm_source=series_page&utm_medium=related_content&utm_term=related_resources&utm_campaign=fredblog

tr = pdr.DataReader( [f1, f2, f3, f4, f5],  "fred", '1/1/10', dt.datetime.today())
print(tr.head())
tr.plot(grid = True) # Plot the adjusted closing price of AAPL
plt.show()

Upvotes: 0

JeeyCi
JeeyCi

Reputation: 597

yfinance works well without pandas_datareader

import yfinance as yf
import matplotlib.pyplot as plt

# use yfinance to collect the data
share_data = yf.download(['SPY'],
                         period="3mo",
                         interval="1d",
                         auto_adjust=True,
                         back_adjust=True,
                         prepost=True)

print(share_data.head())

share_data["Close"].plot(grid = True) # Plot the adjusted closing price of AAPL
plt.show()

Upvotes: 0

Eric McMillan
Eric McMillan

Reputation: 11

To add to the answer above from Tony Shouse, the following code works for me using Visual Studio Code if you would like to gather the Adjusted Close column for multiple ticker symbols at once.

import numpy as np
import pandas as pd
from pandas_datareader import data as wb
import matplotlib.pyplot as plt
import yfinance as yf
yf.pdr_override() # <== that's all it takes :-)

tickers = ['PG', 'MSFT', 'F', 'GE']
portfolio = pd.DataFrame()
for t in tickers:
    portfolio[t] = pdr.get_data_yahoo(t, start="2017-01-01", end="2017-04-30")['Adj Close']

Upvotes: 1

Tony Shouse
Tony Shouse

Reputation: 136

The question is quite old, but here I am. I have found from the yfinance pypi.org project page a section titled 'pandas_datareader override'. It states,

"If your code uses pandas_datareader and you want to download data faster, you can "hijack" pandas_datareader.data.get_data_yahoo() method to use yfinance while making sure the returned data is in the same format as pandas_datareader's get_data_yahoo()."

They also provide the following code sample which is currently working.

from pandas_datareader import data as pdr

import yfinance as yf
yf.pdr_override() # <== that's all it takes :-)

# download dataframe
data = pdr.get_data_yahoo("SPY", start="2017-01-01", end="2017-04-30")

Upvotes: 0

Kamaldeep Singh
Kamaldeep Singh

Reputation: 492

The name of the fix_yahoo_finance package has been changed to yfinance. So you can try this code

import yfinance as yf
data = yf.download('MSFT', start = '2012-01-01', end='2017-01-01')

Upvotes: 7

Bora Savkar
Bora Savkar

Reputation: 75

Yahoo finance works well with pandas. Use it like this:

import pandas as pd
import pandas_datareader as pdr
from pandas_datareader import data as wb

ticker='GOOGL'
start_date='2019-1-1'
data_source='yahoo'

ticker_data=wb.DataReader(ticker,data_source=data_source,start=start_date)
df=pd.DataFrame(ticker_data)

Upvotes: 1

vibhu_singh
vibhu_singh

Reputation: 411

Try this out:

import fix_yahoo_finance as yf
data = yf.download('SPY', start = '2012-01-01', end='2017-01-01')

Upvotes: 2

Dipen Lama
Dipen Lama

Reputation: 147

Make the thread sleep in between reading after each data. May work most of the time, so try 5-6 times and save the data in the csv file, so next time u can read from file.

### code is here ###
import pandas_datareader as web
import time
import datetime as dt
import pandas as pd

symbols = ['AAPL', 'MSFT', 'AABA', 'DB', 'GLD']
webData = pd.DataFrame()
for stockSymbol in symbols:
    webData[stockSymbol] = web.DataReader(stockSymbol, 
    data_source='yahoo',start= 
               startDate, end= endDate, retry_count= 10)['Adj Close']   
    time.sleep(22) # thread sleep for 22 seconds.

Upvotes: 0

Alex L.
Alex L.

Reputation: 41

I changed from Yahoo to Google Finance and it works for me, so from

data.DataReader(ticker, 'yahoo', start_date, end_date)

to

data.DataReader(ticker, 'google', start_date, end_date)

and adapted my "old" Yahoo! symbols from:

tickers = ['AAPL','MSFT','GE','IBM','AA','DAL','UAL', 'PEP', 'KO']

to

tickers = ['NASDAQ:AAPL','NASDAQ:MSFT','NYSE:GE','NYSE:IBM','NYSE:AA','NYSE:DAL','NYSE:UAL', 'NYSE:PEP', 'NYSE:KO']

Upvotes: 2

artDeco
artDeco

Reputation: 520

I found the workaround by "fix-yahoo-finance" in https://pypi.python.org/pypi/fix-yahoo-finance useful, for example:

from pandas_datareader import data as pdr
import fix_yahoo_finance

data = pdr.get_data_yahoo('APPL', start='2017-04-23', end='2017-05-24')

Note the order of the last 2 data columns are 'Adj Close' and 'Volume' ie. not the previous format. To re-index:

cols = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
data.reindex(columns=cols)

Upvotes: 6

Scilear
Scilear

Reputation: 238

So they've changed their url and now use cookies protection (and possibly javascript) so I fixed my own problem using dryscrape, which emulates a browser this is just an FYI as this surely now breaks their terms and conditions... so use at your own risk? I'm looking at Quandl for an alternative EOD price source.

I could not get anywhere with cookie browsing a CookieJar so I ended up using dryscrape to "fake" a user download

import dryscrape
from bs4 import BeautifulSoup
import time
import datetime
import re

#we visit the main page to initialise sessions and cookies
session = dryscrape.Session()
session.set_attribute('auto_load_images', False)
session.set_header('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95     Safari/537.36')    

#call this once as it is slow(er) and then you can do multiple download, though there seems to be a limit after which you have to reinitialise...
session.visit("https://finance.yahoo.com/quote/AAPL/history?p=AAPL")
response = session.body()


#get the dowload link
soup = BeautifulSoup(response, 'lxml')
for taga in soup.findAll('a'):
    if taga.has_attr('download'):
        url_download = taga['href']
print(url_download)

#now replace the default end date end start date that yahoo provides
s = "2017-02-18"
period1 = '%.0f' % time.mktime(datetime.datetime.strptime(s, "%Y-%m-%d").timetuple())
e = "2017-05-18"
period2 = '%.0f' % time.mktime(datetime.datetime.strptime(e, "%Y-%m-%d").timetuple())

#now we replace the period download by our dates, please feel free to improve, I suck at regex
m = re.search('period1=(.+?)&', url_download)
if m:
    to_replace = m.group(m.lastindex)
    url_download = url_download.replace(to_replace, period1)        
m = re.search('period2=(.+?)&', url_download)
if m:
    to_replace = m.group(m.lastindex)
    url_download = url_download.replace(to_replace, period2)

#and now viti and get body and you have your csv
session.visit(url_download)
csv_data = session.body()

#and finally if you want to get a dataframe from it
import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd
df = pd.read_csv(StringIO(csv_data), index_col=[0], parse_dates=True)
df

Upvotes: 2

Related Questions