tjmgis
tjmgis

Reputation: 1661

Read CSV from within Zip File

I have a directory of zip files (approximately 10,000 small files), within each is a CSV file I am trying to read and split into a number of different CSV files.

I managed to write the code to split the CSV files from a directory of CSVs, shown below, that reads the first atttribute of the CSV, and depending what it is write it to the relevent CSV.

import csv
import os
import sys
import re
import glob

reader = csv.reader(open("C:/Projects/test.csv", "rb"), delimiter=',', quotechar='"')
write10 = csv.writer(open('ouput10.csv', 'w'), delimiter=',', lineterminator='\n', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
write15 = csv.writer(open('ouput15.csv', 'w'), delimiter=',', lineterminator='\n', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)


headings10=["RECORD_IDENTIFIER","CUSTODIAN_NAME","LOCAL_CUSTODIAN_NAME","PROCESS_DATE","VOLUME_NUMBER","ENTRY_DATE","TIME_STAMP","VERSION","FILE_TYPE"]
write10.writerow(headings10)

headings15=["RECORD_IDENTIFIER","CHANGE_TYPE","PRO_ORDER","USRN","STREET_DESCRIPTION","LOCALITY_NAME","TOWN_NAME","ADMINSTRATIVE_AREA","LANGUAGE"]
write15.writerow(headings15)


for row in reader:
    type = row[0]
    if "10" in type:        
        write10.writerow(row)
    elif "15" in type:
        write15.writerow(row)

So I am now trying to read the Zip files rather than wasting time extracting them first.

This is what I have so far after following as many tutorials as I have found

import glob
import os
import csv
import zipfile
import StringIO

for name in glob.glob('C:/Projects/abase/*.zip'):
    base = os.path.basename(name)
    filename = os.path.splitext(base)[0]


datadirectory = 'C:/Projects/abase/'
dataFile = filename
archive = '.'.join([dataFile, 'zip'])
fullpath = ''.join([datadirectory, archive])
csv = '.'.join([dataFile, 'csv'])


filehandle = open(fullpath, 'rb')
zfile = zipfile.ZipFile(filehandle)
data = StringIO.StringIO(zfile.read(csv))
reader = csv.reader(data)

for row in reader:
    print row

However and error gets thrown

AttributeError: 'str' object has no attribute 'reader'

Hopefully someone can show me how to change my CSV reading code that works to read the Zip file.

Much appreciated

Tim

Upvotes: 18

Views: 20995

Answers (1)

benesch
benesch

Reputation: 5269

Simple fix. You're overriding the csv module with your local csv variable. Just change the name of that variable:

import glob
import os
import csv
import zipfile
import StringIO

for name in glob.glob('C:/Projects/abase/*.zip'):
    base = os.path.basename(name)
    filename = os.path.splitext(base)[0]


    datadirectory = 'C:/Projects/abase/'
    dataFile = filename
    archive = '.'.join([dataFile, 'zip'])
    fullpath = ''.join([datadirectory, archive])
    csv_file = '.'.join([dataFile, 'csv']) #all fixed


    filehandle = open(fullpath, 'rb')
    zfile = zipfile.ZipFile(filehandle)
    data = StringIO.StringIO(zfile.read(csv_file)) #don't forget this line!
    reader = csv.reader(data)

    for row in reader:
        print row

Upvotes: 20

Related Questions