deversEatsALot
deversEatsALot

Reputation: 39

Converting a .csv.gz to .csv in Python 2.7

I have read the documentation and a few additional posts on SO and other various places, but I can't quite figure out this concept:

When you call csvFilename = gzip.open(filename, 'rb') then reader = csv.reader(open(csvFilename)), is that reader not a valid csv file?

I am trying to solve the problem outlined below, and am getting a coercing to Unicode: need string or buffer, GzipFile found error on line 41 and 7 (highlighted below), leading me to believe that the gzip.open and csv.reader do not work as I had previously thought.

Problem I am trying to solve

I am trying to take a results.csv.gz and convert it to a results.csv so that I can turn the results.csv into a python dictionary and then combine it with another python dictionary.

File 1:

alertFile = payload.get('results_file')
alertDataCSV = rh.dataToDict(alertFile) # LINE 41
alertDataTotal = rh.mergeTwoDicts(splunkParams, alertDataCSV)

Calls File 2:

import gzip
import csv

def dataToDict(filename):
    csvFilename = gzip.open(filename, 'rb')
    reader = csv.reader(open(csvFilename)) # LINE 7
    alertData={}
    for row in reader:
        alertData[row[0]]=row[1:]
    return alertData

def mergeTwoDicts(dictA, dictB):
    dictC = dictA.copy()
    dictC.update(dictB)
    return dictC

*edit: also forgive my non-PEP style of naming in Python

Upvotes: 0

Views: 10587

Answers (2)

Hafizur Rahman
Hafizur Rahman

Reputation: 2372

The following worked for me for python==3.7.9:

import gzip

my_filename = my_compressed_file.csv.gz

with gzip.open(my_filename, 'rt') as gz_file:
    data = gz_file.read() # read decompressed data
    with open(my_filename[:-3], 'wt') as out_file:
         out_file.write(data) # write decompressed data

my_filename[:-3] is to get the actual filename so that it does get a random filename.

Upvotes: 2

ShadowRanger
ShadowRanger

Reputation: 155363

gzip.open returns a file-like object (same as what plain open returns), not the name of the decompressed file. Simply pass the result directly to csv.reader and it will work (the csv.reader will receive the decompressed lines). csv does expect text though, so on Python 3 you need to open it to read as text (on Python 2 'rb' is fine, the module doesn't deal with encodings, but then, neither does the csv module). Simply change:

csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename))

to:

# Python 2
csvFile = gzip.open(filename, 'rb')
reader = csv.reader(csvFile)  # No reopening involved

# Python 3
csvFile = gzip.open(filename, 'rt', newline='')  # Open in text mode, not binary, no line ending translation
reader = csv.reader(csvFile)  # No reopening involved

Upvotes: 2

Related Questions