Reputation: 131
I've gone around in circles on this one. A bit frustrating as the solution is probably close at hand.
Anyway, I found a URL that returns some data in CSV format. However, the URL itself does not contain the csv file name. In a web browser, I can easily go to the link and them I'm asked whether I want to open or save the file. So, ultimately I know I'm getting a csv file with a name. I'm just not sure how to execute the task in python as there seems to be some intermediate data type being passed (bytes)
I've tried the following to no avail:
import urllib
import io
import pandas as pd
link = r'http://www.cboe.com/products/vix-index-volatility/vix-options-and-futures/vix-index/vix-historical-data/'
f = urllib.request.urlopen(link)
myfile = f.read()
buf = io.BytesIO(myfile) # originally tried io.StringIO(myfile) but then realized myfile is in bytes
df = pd.read_csv(buf)
Any suggestions?
The df should contain data that looks similar to:
1/5/2004,18.45,18.49,17.44,17.49 1/6/2004,17.66,17.67,16.19,16.73 1/7/2004,16.72,16.75,15.5,15.5 1/8/2004,15.42,15.68,15.32,15.61 1/9/2004,16.15,16.88,15.57,16.75 1/12/2004,17.32,17.46,16.79,16.82
Here is the last line of the error message:
ParserError: Error tokenizing data. C error: Expected 2 fields in line 24, saw 4
Upvotes: 1
Views: 1176
Reputation: 1839
This is not really an answer, but just to notify the link from CBOE is not valid at this moment (starting from 2020-DEC-07 to today, 2020-DEC-23), not sure if the url will be back yet. There is a similar format from datahub.io, but it is not up-to-date, free data from CHRIS via Quandl is also not up-to-date. I have yet to find an official notice from CBOE stating this url will no longer be supported. Posted a similar question/finding in quantconnect.
https://www.quantconnect.com/forum/discussion/7673/problem-pulling-cboe-vix-data-on-live-trading/p1
import pandas as pd
url='http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vixcurrent.csv'
df = pd.read_csv(url)
print(df.shape)
/usr/lib/python3.6/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
648 class HTTPDefaultErrorHandler(BaseHandler):
649 def http_error_default(self, req, fp, code, msg, hdrs):
--> 650 raise HTTPError(req.full_url, code, msg, hdrs, fp)
651
652 class HTTPRedirectHandler(BaseHandler):
HTTPError: HTTP Error 404: NOT FOUND
above url from CBOE seems to no longer be working.
Out-dated-data can be obtained from datahub.io & quandl:
url = 'https://datahub.io/zelima1/finance-vix/r/vix-daily.csv'
df = pd.read_csv(url)
print(df.shape)
print(df.Date)
(3488, 5)
(3488, 5)
0 2004-01-02
1 2004-01-05
2 2004-01-06
3 2004-01-07
4 2004-01-08
...
3483 2017-11-01
3484 2017-11-02
3485 2017-11-03
3486 2017-11-06
3487 2017-11-07
Name: Date, Length: 3488, dtype: object
Quandl CHRIS VIX:
https://www.quandl.com/data/CHRIS/CBOE_VX1-S-P-500-Volatility-Index-VIX-Futures-Continuous-Contract-1-VX1-Front-Month
Upvotes: 1
Reputation: 156
@Fred - I think that you are simply using the wrong URL. When I replace the link with http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vixcurrent.csv, your script works.
I found this URL on the page your script originally pointed to.
Upvotes: 1