Reputation: 1
I am trying to split out one large file into an unknown number of files based on a field on a row by row basis. In this case, I want all records with a July 2016 date to write to one file, August 2016 to another, etc. I don't want to have to comb through the file twice, first to populate a list of files that need to be created, and then to actually write to them.
My first thought was to create a dictionary where the key was the file name (based on the date) and the return was a class that would write out to the csv files.
import csv
class testClass:
a = None
k = None
def __init__(self,theFile):
with open(theFile,'wb') as self.k:
self.a = csv.writer(self.k)
def writeOut(self,inString):
self.a.writerow(inString)
testDict = {'07m19':testClass('07m19_test2')}
testDict['07m19'].writeOut(['test'])
When i try to run this I get the following error:
ValueError: I/O operation on closed file
Which makes sense, by the time the class is done initializing theFile is closed.
I think the with statement is required because the files are very big and I can't load it all into memory. That being said, I am not sure how else to approach this.
Upvotes: 0
Views: 526
Reputation: 19432
I can't load it all into memory
You are not loading it all to memory by open
ing a file. You just create a file object. When you do f.read()
you load all its contents to memory as a string.
So you can do:
class testClass:
def __init__(self,theFile):
self.k = open(theFile,'wb')
self.a = csv.writer(self.k)
def writeOut(self,inString):
self.a.writerow(inString)
def __del__(self):
self.k.close()
Because it is not guarenteed that __del__
will be called at the end of execution you might want to add a close
method and call it just like with files.
Upvotes: 1
Reputation: 114025
You don't necessarily need a class for this. Let's pretend that your input file is a csv file and has the full name of the month in the first column:
with open('path/to/input') as infile:
for rownum,row in enumerate(csv.reader(infile),1):
month = row[0]
with open('path/to/output_{}.csv'.format(month), 'a') as outfile:
if not rownum%100: print("Processed row", rownum, end='\r', flush=True)
csv.writer(outfile).writerow(row)
print("Processed row", rownum)
Upvotes: 2
Reputation: 39404
def __init__(self,theFile):
self.k = open(theFile,'wb')
self.a = csv.writer(self.k)
Upvotes: -1