Fred Flores
Fred Flores

Reputation: 131

Importing data from URL using Python (into pandas dataframe)?

I've gone around in circles on this one. A bit frustrating as the solution is probably close at hand.

Anyway, I found a URL that returns some data in CSV format. However, the URL itself does not contain the csv file name. In a web browser, I can easily go to the link and them I'm asked whether I want to open or save the file. So, ultimately I know I'm getting a csv file with a name. I'm just not sure how to execute the task in python as there seems to be some intermediate data type being passed (bytes)

I've tried the following to no avail:

import urllib
import io
import pandas as pd
link = r'http://www.cboe.com/products/vix-index-volatility/vix-options-and-futures/vix-index/vix-historical-data/'
f = urllib.request.urlopen(link)
myfile = f.read()
buf = io.BytesIO(myfile)  # originally tried io.StringIO(myfile) but then realized myfile is in bytes
df = pd.read_csv(buf)

Any suggestions?

The df should contain data that looks similar to:

1/5/2004,18.45,18.49,17.44,17.49 1/6/2004,17.66,17.67,16.19,16.73 1/7/2004,16.72,16.75,15.5,15.5 1/8/2004,15.42,15.68,15.32,15.61 1/9/2004,16.15,16.88,15.57,16.75 1/12/2004,17.32,17.46,16.79,16.82

Here is the last line of the error message:

ParserError: Error tokenizing data. C error: Expected 2 fields in line 24, saw 4

Upvotes: 1

Views: 1176

Answers (2)

pangyuteng
pangyuteng

Reputation: 1839

This is not really an answer, but just to notify the link from CBOE is not valid at this moment (starting from 2020-DEC-07 to today, 2020-DEC-23), not sure if the url will be back yet. There is a similar format from datahub.io, but it is not up-to-date, free data from CHRIS via Quandl is also not up-to-date. I have yet to find an official notice from CBOE stating this url will no longer be supported. Posted a similar question/finding in quantconnect.

https://www.quantconnect.com/forum/discussion/7673/problem-pulling-cboe-vix-data-on-live-trading/p1

import pandas as pd
url='http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vixcurrent.csv'
df = pd.read_csv(url)
print(df.shape)
/usr/lib/python3.6/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    648 class HTTPDefaultErrorHandler(BaseHandler):
    649     def http_error_default(self, req, fp, code, msg, hdrs):
--> 650         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    651 
    652 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: NOT FOUND

above url from CBOE seems to no longer be working.

Out-dated-data can be obtained from datahub.io & quandl:

url = 'https://datahub.io/zelima1/finance-vix/r/vix-daily.csv'
df = pd.read_csv(url)
print(df.shape)
print(df.Date)
(3488, 5)
(3488, 5)
0       2004-01-02
1       2004-01-05
2       2004-01-06
3       2004-01-07
4       2004-01-08
           ...    
3483    2017-11-01
3484    2017-11-02
3485    2017-11-03
3486    2017-11-06
3487    2017-11-07
Name: Date, Length: 3488, dtype: object

Quandl CHRIS VIX:

https://www.quandl.com/data/CHRIS/CBOE_VX1-S-P-500-Volatility-Index-VIX-Futures-Continuous-Contract-1-VX1-Front-Month

Upvotes: 1

Ben G.
Ben G.

Reputation: 156

@Fred - I think that you are simply using the wrong URL. When I replace the link with http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vixcurrent.csv, your script works.

I found this URL on the page your script originally pointed to.

Upvotes: 1

Related Questions