Reputation: 7469
I have a very large (~8 gb) text file that has very long lines. I would like to pull out lines in selected ranges of this file and put them in another text file. In fact my question is very similar to this and this but I keep getting stuck when I try to select a range of lines instead of a single line.
So far this is the only approach I have gotten to work:
lines = readin.readlines()
out1.write(str(lines[5:67]))
out2.write(str(lines[89:111]))
However this gives me a list and I would like to output a file with a format identical to the input file (one line per row)
Upvotes: 3
Views: 13914
Reputation: 7821
The first thing you should think of when facing a problem like this, is to avoid reading the entire file into memory at once. readlines()
will do that, so that specific method should be avoided.
Luckily, we have an excellent standard library in Python, itertools
. itertools
has lot of useful functions, and one of them is islice
. islice
iterates over an iterable (such as lists, generators, file-like objects etc.) and returns a generator containing the range specified:
itertools.islice(iterable, start, stop[, step])
Make an iterator that returns selected elements from the iterable. If start is non-zero, then elements from the iterable are skipped until start is reached. Afterward, elements are returned consecutively unless step is set higher than one which results in items being skipped. If stop is None, then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position. Unlike regular slicing, islice() does not support negative values for start, stop, or step. Can be used to extract related fields from data where the internal structure has been flattened (for example, a multi-line report may list a name field on every third line)
Using this information, together with the str.join method, you can e.g. extract lines 10-19 by using this simple code:
from itertools import islice
# Add the 'wb' flag if you use Windows
with open('huge_data_file.txt', 'wb') as data_file:
txt = '\n'.join(islice(data_file, 10, 20))
Note that when looping over the file object, the newline char is stripped from the lines, so you need to set \n as the joining char.
Upvotes: 1
Reputation: 280
might i suggest not storing the entire file (since it is large) as per one of your links?
f = open('file')
n = open('newfile', 'w')
for i, text in enumerate(f):
if i > 4 and i < 68:
n.write(text)
elif i > 88 and i < 112:
n.write(text)
else:
pass
i'd also recommend using 'with' instead of opening and closing the file, but i unfortunately am not allowed to upgrade to a new enough version of python for that here : (.
Upvotes: 2
Reputation: 3351
You can call join on the ranges.
lines = readin.readlines()
out1.write(''.join(lines[5:67]))
out2.write(''.join(lines[89:111]))
Upvotes: 4
Reputation: 105
path = "c:\\someplace\\"
Open 2 text files. One for reading and one for writing
f_in = open(path + "temp.txt", 'r')
f_out = open(path + output_name, 'w')
go through each line of the input file
for line in f_in:
if i_want_to_write_this_line == True:
f_out.write(line)
close the files when done
f_in.close()
f_out.close()
Upvotes: 0
Reputation: 74715
(Partial Answer) In order to make your current approach work you'll have to write line by line. For instance:
lines = readin.readlines()
for each in lines[5:67]:
out1.write(each)
for each in lines[89:111]:
out2.write(each)
Upvotes: 0