Reputation: 2033
I have written a code that reads a large (>15 GB) text file and converts the data for a csv file one line at a time.
txt_file = fileName+".txt"
csv_file = fileName+".csv"
with open(txt_file, "r") as tf, open(csv_file, "w") as cf:
for line in tf:
cf.writelines(" ".join(line.split()).replace(' ', ','))
cf.write("\n")
edit:
As for the data,
Data in input file:
abc def ghi jkl
Expected data in output file:
abc,def,ghi,jkl
I am using Python 2.7.6 in Mac OSX 10.10.3
Upvotes: 0
Views: 1866
Reputation: 3560
The easiest way to do it is this.
with open("file.json", "r") as r, open("write.csv", "a") as w:
lines = []
for l in r:
#Process l
if len(lines) < 1000000: #Only uses 54mb of RAM so I hear
lines.append(l)
else:
w.writelines(lines)
del lines[:]
Upvotes: -1
Reputation: 2040
I know this is not technically answering your question, but if you are able to process the files before your python script, I believe using sed would be the fastest way to do this. Considering your large file sizes, I think it is worth the non python related suggestion.
How to replace space with comma using sed
You can do this via command line before starting your python script, or even invoke it within your script using subprocess.
Upvotes: 0
Reputation: 1124548
Leave parsing and formatting CSV to the csv
module:
import csv
txt_file = fileName + ".txt"
csv_file = fileName + ".csv"
with open(txt_file, "rb") as tf, open(csv_file, "wb") as cf:
reader = csv.reader(tf, delimiter=' ')
writer = csv.writer(cf)
writer.writerows(reader)
or if you have strange quoting, treating the input file as text and manually splitting:
import csv
txt_file = fileName + ".txt"
csv_file = fileName + ".csv"
with open(txt_file, "rb") as tf, open(csv_file, "wb") as cf:
writer = csv.writer(cf)
writer.writerows(line.split() for line in tf)
File streams use buffers to read and write data in chunks.
Upvotes: 2