Reputation: 3813
I would like to split a very large .txt file in to equal parts files each part containing N lines. and save it to a folder
with open('eg.txt', 'r') as T:
while True:
next_n_lines = islice(T, 300)
f = open("split" + str(x.pop()) + ".txt", "w")
f.write(str(next_n_lines))
f.close()
But this creates a files with data
" < itertools.islice object at 0x7f8fa94a4940 >"
in the txt
files.
I would like to preserve the same structure and style maintained in the original txt
file.
And this code does not terminate automatically when it reaches end of file as well. If possible I would the code to stop writing to files and quit if there is no data left to write.
Upvotes: 3
Views: 3500
Reputation: 180540
You can use iter
with islice
, taking n lines at a time using enumerate to give your files unique names. f.writelines
will write each list of lines to a new file:
with open('eg.txt') as T:
for i, sli in enumerate(iter(lambda:list(islice(T, 300)), []), 1):
with open("split_{}.txt".format(i), "w") as f:
f.writelines(sli)
Your code loops forever as you don't include any break condition, using iter
with an empty list will mean the loop ends when the iterator has been exhausted.
Also if you wanted to pass an islice object to be written you would just call writelines
on it i.e f.writelines(next_n_lines)
, str(next_n_lines)
.
Upvotes: 6
Reputation: 107347
The problem is tat itertools.islice
returns an iterator and you are writing it's str
in your file which is the representation of functions in python (showing the identity of object):
< itertools.islice object at 0x7f8fa94a4940 >
As a more pythinic way for slicing an iterator to equal parts, you can use following grouper
function, which has been suggested by python wiki as itertools recipes
:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
You can pass your file object as an iterator to function and then loop over the result and writ them to your file:
with open('eg.txt', 'r') as T:
for partition in grouper(T,300):
# do anything with `partition` like join the lines
# or any modification you like. Then write it in output.
Upvotes: 2