Reputation: 39
I have read the documentation and a few additional posts on SO and other various places, but I can't quite figure out this concept:
When you call csvFilename = gzip.open(filename, 'rb')
then reader = csv.reader(open(csvFilename))
, is that reader
not a valid csv file?
I am trying to solve the problem outlined below, and am getting a coercing to Unicode: need string or buffer, GzipFile found
error on line 41 and 7 (highlighted below), leading me to believe that the gzip.open and csv.reader do not work as I had previously thought.
Problem I am trying to solve
I am trying to take a results.csv.gz
and convert it to a results.csv
so that I can turn the results.csv
into a python dictionary and then combine it with another python dictionary.
File 1:
alertFile = payload.get('results_file')
alertDataCSV = rh.dataToDict(alertFile) # LINE 41
alertDataTotal = rh.mergeTwoDicts(splunkParams, alertDataCSV)
Calls File 2:
import gzip
import csv
def dataToDict(filename):
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename)) # LINE 7
alertData={}
for row in reader:
alertData[row[0]]=row[1:]
return alertData
def mergeTwoDicts(dictA, dictB):
dictC = dictA.copy()
dictC.update(dictB)
return dictC
*edit: also forgive my non-PEP style of naming in Python
Upvotes: 0
Views: 10587
Reputation: 2372
The following worked for me for python==3.7.9
:
import gzip
my_filename = my_compressed_file.csv.gz
with gzip.open(my_filename, 'rt') as gz_file:
data = gz_file.read() # read decompressed data
with open(my_filename[:-3], 'wt') as out_file:
out_file.write(data) # write decompressed data
my_filename[:-3] is to get the actual filename so that it does get a random filename.
Upvotes: 2
Reputation: 155363
gzip.open
returns a file-like object (same as what plain open
returns), not the name of the decompressed file. Simply pass the result directly to csv.reader
and it will work (the csv.reader
will receive the decompressed lines). csv
does expect text though, so on Python 3 you need to open it to read as text (on Python 2 'rb'
is fine, the module doesn't deal with encodings, but then, neither does the csv
module). Simply change:
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename))
to:
# Python 2
csvFile = gzip.open(filename, 'rb')
reader = csv.reader(csvFile) # No reopening involved
# Python 3
csvFile = gzip.open(filename, 'rt', newline='') # Open in text mode, not binary, no line ending translation
reader = csv.reader(csvFile) # No reopening involved
Upvotes: 2