Reputation: 11
I'm trying to do some data clean-up using python. I have some large (1 - 2gigs) csv
files that I want to sort by some attribute (e.g. date, time), and then output another csv
file with this info with the purpose of making it able to be used in excel.
As I iterate through the rows, I come across some big memory issues. Initially I was using a 32-bit Idle which wouldn't run my code, and then switched to 64-bit Spyder
. Now the code runs, but halts (appears to process, memory is consumed, but haven't seen it move on in the last half hour) at the first iterative line.
My code is as follows. The process halts at line 10 (highlighted). I'm pretty new to python so I'm sure my code is very primitive, but its the best I can do! Thanks for your help in advance :)
def file_reader(filename):
"function takes string of file name and returns a list of lists"
global master_list
with open(filename, 'rt') as csvfile:
rows = []
master_list = []
rowreader = csv.reader(csvfile, delimiter=',', quotechar='|')
**for row in rowreader:**
rows.append(','.join(row))
for i in rows:
master_list.append(i.replace(' ', '').replace('/2013', ',').split(","))
return master_list
def trip_dateroute(date,route):
dateroute_list = []
for i in master_list:
if str(i[1]) == date and str(i[3]) == route:
dateroute_list.append(i)
return dateroute_list
def output_csv(filename, listname):
with open(filename, "w") as csvfile:
writer = csv.writer(csvfile, delimiter=',', quotechar='|', lineterminator='\n')
for i in listname:
writer.writerow(i)
Upvotes: 1
Views: 547
Reputation: 1726
If you don't need to hold the whole file content in memory, you can just process each line and immediately write it to the output file. Also, in your example you parse the CSV and then generate CSV again, but you don't seem to make use of parsed data. If that is correct, you could simply do this:
def file_converter(infilename, outfilename):
with open(infilename, 'rt') as infile, open(outfilename, "w") as outfile:
for line in infile:
line.replace(' ', '').replace('/2013', ',')
outfile.write(line)
If the function trip_dateroute()
is used to filter the lines that should actually be written out, you can add that, too, but then you'd actually have to parse CSV:
def filter_row(row, date, route):
return str(row[1]) == date and str(row[3]) == route
def cleanup(field):
return field.replace(' ', '').replace('/2013', ',')
def file_converter(infilename, outfilename, date, route):
with open(infilename, 'rt') as infile, open(outfilename, "w") as outfile:
reader = csv.reader(infile, delimiter=',', quotechar='|')
writer = csv.writer(outfile, delimiter=',', quotechar='|', lineterminator='\n')
for row in reader:
row = [cleanup(field) for field in row if filter_row(row, date, route)]
writer.writerow(row)
Upvotes: 1