Reputation: 27
I am attempting to download a .zip file from https://www.fec.gov/data/browse-data/?tab=bulk-data specifically https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip. Compressed, the file is 2.7 GB. The download is initiated and complete within 10 seconds. When I then try to unzip the file, I receive the error messages below. When downloaded to my local machine, the link downloads as a .zip file and opens to the data requested.
!python --version
Python 3.7.8
!curl -O https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 138 100 138 0 0 690 0 --:--:-- --:--:-- --:--:-- 690
!unzip -a indiv20.zip
Archive: indiv20.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of indiv20.zip or indiv20.zip.zip, and cannot find indiv20.zip.ZIP, period.
import zipfile
with zipfile.ZipFile("indiv20.zip", 'r') as zip_ref:
zip_ref.extractall()
Upvotes: 0
Views: 1130
Reputation: 300
Looks like the HTTP server is returning a redirection and curl
is storing the "302 Found" message into the indiv20.zip file instead of the actual ZIP data.
You can solve this by adding the -L
(or --location) parameter to the curl
command so it follows redirects:
$ curl -LO https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip
Upvotes: 1
Reputation: 420
Check the content of the file. It is probably an error message in html. (cat indiv20.zip)
Upvotes: 0