Reputation: 3580
I installed pandas 3.5 (against some of your suggestions) and cannot seem to figure out why the new code won't load the zip file from an URL:
import pandas as pd
import numpy as np
from io import StringIO
from zipfile import ZipFile
from urllib.request import urlopen
url = urlopen("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip")
#Download Zipfile and create pandas DataFrame
zipfile = ZipFile(StringIO(url.read()))
FFdata = pd.read_csv(zipfile.open('F-F_Research_Data_Factors.CSV'),
header = 0, names = ['Date','MKT-RF','SMB','HML','RF'],
skiprows=3)
I believe its failing on the urlopen function. But it doesn't work when substituting the URL as a text string.
Does anyone know what's happening? Thank you!
Upvotes: 1
Views: 2248
Reputation: 77337
Running your program I get the error
Traceback (most recent call last):
File "c.py", line 9, in <module>
zipfile = ZipFile(StringIO(url.read()))
TypeError: initial_value must be str or None, not bytes
A quick test confirms that the problem is you are passing a byte string to StringIO
.
td@mintyfresh ~/tmp $ python3
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import io
>>> io.StringIO(b'aaa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: initial_value must be str or None, not bytes
The solution is simple.... just use an io.BytesIO
object instead. This is a common error because the StringIO
would have worked in python 2 and lots of examples are 2.x based.
import pandas as pd
import numpy as np
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
url = urlopen("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip")
#Download Zipfile and create pandas DataFrame
zipfile = ZipFile(BytesIO(url.read()))
FFdata = pd.read_csv(zipfile.open('F-F_Research_Data_Factors.CSV'),
header = 0, names = ['Date','MKT-RF','SMB','HML','RF'],
skiprows=3)
Upvotes: 3