user131983
user131983

Reputation: 3927

IOError when decompressing gzip file

I'm trying to download and decompress a gzip file and then convert the resulting decompressed file which is of tsv format into a CSV format which would be easier to parse. I am trying to gather the data from the "Download Table" link in this URL. My code is as follows, where I am using the same idea as in this post, however I get the error IOError: Not a gzipped file in the line outfile.write(decompressedFile.read()). My code is as follows:

import os
import urllib2 
import gzip
import StringIO

baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?"
filename = "D:\Sidney\irt_euryld_d.tsv.gz" #Edited after heinst's comment below
outFilePath = filename[:-3]

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())

compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') 

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

#Now have to deal with tsv file
import csv

with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
    tsvin = csv.reader(tsvin, delimiter='\t')
    csvout = csv.writer(csvout) #Converting output into CSV Format

Upvotes: 1

Views: 367

Answers (1)

Srgrn
Srgrn

Reputation: 1825

basically you try to pull a wrong file when checking the response in your code you get an html page of an error you are trying to add your own path to the url which leads to a wrong url

import os
import urllib2 
import gzip
import StringIO

baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file="
filename = "data/irt_euryld_d.tsv.gz" #Edited after heinst's comment below
outFilePath = filename.split('/')[1][:-3]
response = urllib2.urlopen(baseURL + filename)
print response
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())

compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') 

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

#Now have to deal with tsv file
import csv

with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
    tsvin = csv.reader(tsvin, delimiter='\t')
    csvout = csv.writer(csvout) #Converting output into CSV Format

the difference is the line for filename and a small addition to the baseURL filename = "data/irt_euryld_d.tsv.gz" which is the correct file name according to the link you specified

the other change is this line outFilePath = filename.split('/')[1][:-3]

which could be better written as

outFilePath = os.join('D:','Sidney',filename.split('/')[1][:-3])

Upvotes: 3

Related Questions