Reputation: 99

Is this python file.seek() routine correct?

This routine looks ok to me but ends up writing rubbish to the file. lines_of_interest is a set of lines (896227L, 425200L, 640221L, etc) that need to be changed in the file. The if else routine determines what is changed on that line. This is the first time I have used seek() but believe the syntax is correct. Can anyone spot any errors in the code that will get it working correctly?

outfile = open(OversightFile, 'r+')
for lines in lines_of_interest:
        for change_this in outfile:
            line = change_this.decode('utf8', 'replace')
            outfile.seek(lines)
            if replacevalue in line:
                line = line.replace(replacevalue, addValue)
                outfile.write(line.encode('utf8', 'replace'))
                break#Only check 1 line
            elif not addValue in line:
                #line.extend(('_w\t1\t'))
                line = line.replace("\t\n", addValue+"\n")
                outfile.write(line.encode('utf8', 'replace'))
                break#Only check 1 line
outfile.close()

Upvotes: 1

Answers (2)

7stud

Reputation: 48599

You should think of files as unchangeable(unless you want to append to file). If you want to change the existing lines in a file, here are the steps:

Read each line from your input file, e.g. data.txt
Write every line including the changed lines to an output file, e.g. new_file.txt
Delete the input file.
Rename the output file to the input file name.

One problem you don't want to have to deal with in step 2) is trying to conjure up a filename that doesn't already exist. The tempfile module will do that for you.

The fileinput module can be used to do all those steps transparently:

#1.py
import fileinput as fi

f = fi.FileInput('data.txt', inplace=True)

for line in f:
    print "***" + line.rstrip()

f.close()

--output:--
$ cat data.txt
abc
def
ghi
$ python 1.py 
$ cat data.txt
***abc
***def
***ghi

The fileinput module opens the filename you give it and renames the file. Then print statements are directed into the an empty file created with the original name. When you are done, the renamed file is deleted (or you can specify that it should remain).

Upvotes: 2

Martijn Pieters

Reputation: 1121784

You are both looping over the file and seeking in it, multiple times, but never reset the position before reading again.

In the first iteration, you read the first line, then you seek elsewhere into the file, write to that position, then break out of the for change_this in out_file: loop.

The next iteration of the for lines in lines_of_interest: loop then starts reading from outfile again, but the file is now positioned at the point where the last outfile.write() left off. That means you are now reading whatever followed the data you just have written.

This is probably not what you wanted to do.

If you wanted to read the line from the same position, and write it back to the same location, you need to seek first and use .readline() instead of iteration to read your line. Then seek again before writing:

outfile = open(OversightFile, 'r+')

for position in lines_of_interest:
    outfile.seek(position)
    line = outfile.readline().decode('utf8', 'replace')
    outfile.seek(position)
    if replacevalue in line:
        line = line.replace(replacevalue, addValue)
        outfile.write(line.encode('utf8'))
    elif not addValue in line:
        line = line.replace("\t\n", addValue+"\n")
        outfile.write(line.encode('utf8')

Note however, that if you write out data that is shorter or longer than the original line, the file size will not adjust! Writing a longer line will overwrite the first characters of the next line, a shorter write will leave the trailing characters of the old line in the file.

Upvotes: 1

Is this python file.seek() routine correct?

Answers (2)

Related Questions