Reputation: 69
My problem is i have to read a big text file (several GBs at least) and then while reading, according to a pattern i will write some portion of it to one of many output text files (about 5000). If this or that pattern is present, i need to write on this or that file.
So I can create all 5000 text files beforehand, but I don't know how to access that specific text file later to write. Effiency is also a big problem, but i am not even there.
to make it more clear: there are 5000 patterns but total numbe rof them are hundreds of millions, maybe more. So whenever i stumble upon a specific pattern, i will write it to its text file. However there is no order, so i may need to call same outputfile 1million lines later for example or just after 3 lines, whenever i see it
Thanks in advance (note: i am also a beginner in python language and i am using 3.6)
Upvotes: 0
Views: 60
Reputation: 325
The built-in for opening files in python is open()
.
In Your case I would probably use it with mode = r
for the big file and mode = a
for all the other files. Python will create a file if it is not already there, so no need to create them beforehand.
While reading the big file you can just specify the the path to the file you want to write to as a string, so you can use string formatting on it.
with open(r"/BigFile.txt",mode=r) as InputFile:
for row in InputFile:
id = #what you want to have to determine which file to write to
file_to_write_to = r"/Subfiles/outputfile{}.txt".format(id)
with open(file_to_write_to,mode="a") as OutputFile:
OutputFile.write(row + "\n")
(The advantage of the with open()
syntax is that you do not have to call the .close()
function on the file Object)
This code has the disadvantage that there is one file open and close operation per input block. You might want to consider building a list of several output operations before exporting them as a batch, but that will only give a time advantage if there are multiple output operations on the same file.
BATCH_SIZE = 500
batch_dict = {}
with open(r"/BigFile.txt",mode=r) as InputFile:
for index,row in enumerate(InputFile):
id = #what you want to have to determine which file to write to
if batch_dict.setdefault(str(id),row) is not None:
batch_dict[str(id)] = batch_dict[str(id)] + row +"\n"
if index % BATCH_SIZE = 0:
for batch_id,batch in batch_dict:
file_to_write_to = r"/Subfiles/outputfile{}.txt".format(id)
with open(file_to_write_to,mode="a") as OutputFile:
OutputFile.write(batch + "\n")
batch_dict = {}
(Code is untested as I dont have python 3 right now)
Upvotes: 3
Reputation: 6298
You should open the file only when needed in appending mode, write your data and then close it.
with open('my-file-name','a+') as ff:
ff.write('my-text'+'\n')
Upvotes: 0