Reputation: 7469

Selecting and printing specific rows of text file

I have a very large (~8 gb) text file that has very long lines. I would like to pull out lines in selected ranges of this file and put them in another text file. In fact my question is very similar to this and this but I keep getting stuck when I try to select a range of lines instead of a single line.

So far this is the only approach I have gotten to work:

lines = readin.readlines()
out1.write(str(lines[5:67]))
out2.write(str(lines[89:111]))

However this gives me a list and I would like to output a file with a format identical to the input file (one line per row)

Upvotes: 3

Answers (5)

Steinar Lima

Reputation: 7821

The first thing you should think of when facing a problem like this, is to avoid reading the entire file into memory at once. readlines() will do that, so that specific method should be avoided.

Luckily, we have an excellent standard library in Python, itertools. itertools has lot of useful functions, and one of them is islice. islice iterates over an iterable (such as lists, generators, file-like objects etc.) and returns a generator containing the range specified:

itertools.islice(iterable, start, stop[, step])

Make an iterator that returns selected elements from the iterable. If start is non-zero, then elements from the iterable are skipped until start is reached. Afterward, elements are returned consecutively unless step is set higher than one which results in items being skipped. If stop is None, then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position. Unlike regular slicing, islice() does not support negative values for start, stop, or step. Can be used to extract related fields from data where the internal structure has been flattened (for example, a multi-line report may list a name field on every third line)

Using this information, together with the str.join method, you can e.g. extract lines 10-19 by using this simple code:

from itertools import islice

# Add the 'wb' flag if you use Windows
with open('huge_data_file.txt', 'wb') as data_file: 
    txt = '\n'.join(islice(data_file, 10, 20))

Note that when looping over the file object, the newline char is stripped from the lines, so you need to set \n as the joining char.

Upvotes: 1

aeroNotAuto

Reputation: 280

might i suggest not storing the entire file (since it is large) as per one of your links?

f = open('file')
n = open('newfile', 'w')
for i, text in enumerate(f):
    if i > 4 and i < 68:
        n.write(text)
    elif i > 88 and i < 112:
        n.write(text)
    else:
        pass

i'd also recommend using 'with' instead of opening and closing the file, but i unfortunately am not allowed to upgrade to a new enough version of python for that here : (.

Upvotes: 2

Bob

Reputation: 3351

You can call join on the ranges.

lines = readin.readlines()
out1.write(''.join(lines[5:67]))
out2.write(''.join(lines[89:111]))

Upvotes: 4

Mike K

Reputation: 105

path = "c:\\someplace\\"

Open 2 text files. One for reading and one for writing

f_in = open(path + "temp.txt", 'r')
f_out = open(path + output_name, 'w')

go through each line of the input file

for line in f_in:
    if i_want_to_write_this_line == True:
        f_out.write(line)

close the files when done

f_in.close()
f_out.close()

Upvotes: 0

Manoj Govindan

Reputation: 74715

(Partial Answer) In order to make your current approach work you'll have to write line by line. For instance:

lines = readin.readlines()

for each in lines[5:67]:
    out1.write(each)

for each in lines[89:111]:
    out2.write(each)

Upvotes: 0

Selecting and printing specific rows of text file

Answers (5)

Related Questions