Reputation: 15803
Is it possible to read a .zip file that includes only a .dta
file from URL?
For example, https://www.federalreserve.gov/econres/files/scfp2016s.zip contains one file: rscfp2016.dta
, but pandas.read_stata
doesn't work for it:
import pandas as pd
pd.read_stata('https://www.federalreserve.gov/econres/files/scfp2016s.zip')
ValueError: Version of given Stata file is not 104, 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), or 118 (Stata 14)
read_csv
supports reading zipped files if the zip only includes the csv, via the compression
argument which defaults to inferring the compression. read_stata
lacks this option.
I could do it by downloading and unzipping the file, then reading it, but this is messy.
!wget https://www.federalreserve.gov/econres/files/scfp2016s.zip
!unzip scfp2016s.zip
df = pd.read_stata('rscfp2016.dta')
Any better way?
Upvotes: 2
Views: 624
Reputation: 7214
You can try it with requests:
import io
import zipfile
import requests
response = requests.get('https://www.federalreserve.gov/econres/files/scfp2016s.zip')
a = zipfile.ZipFile(io.BytesIO(response.content))
b = a.read(a.namelist()[0])
pd.read_stata(io.BytesIO(b))
Upvotes: 1
Reputation: 249223
read_stata
accepts file-like objects, so you can do this:
import pandas as pd
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
url = 'https://www.federalreserve.gov/econres/files/scfp2016s.zip'
with urlopen(url) as request:
data = BytesIO(request.read())
with ZipFile(data) as archive:
with archive.open(archive.namelist()[0]) as stata:
df = pd.read_stata(stata)
Upvotes: 1