D.Rosado
D.Rosado

Reputation: 5773

How to efficiently append a new line to the starting of a large file?

I want to append a new line in the starting of 2GB+ file. I tried following code but code OUT of MEMORY error.

myfile = open(tableTempFile, "r+")
myfile.read() # read everything in the file
myfile.seek(0) # rewind
myfile.write("WRITE IN THE FIRST LINE ")
myfile.close();
  1. What is the way to write in a file file without getting the entire file in memory?
  2. How to append a new line at starting of the file?

Upvotes: 3

Views: 3155

Answers (4)

Louis Caron
Louis Caron

Reputation: 1332

Based on a previous solution with temporary file:

def add_lines_at_beginning(filename: str, text: str):
    # using temporary file (will be removed when closed)
    with (open(filename, 'r', encoding="utf-8") as infile,
        NamedTemporaryFile(mode='w+', encoding="utf-8", delete=False) as outfile):
        # replace the first line with the string provided:
        outfile.writelines(
            (line for line in chain((text,), islice(infile,1,None))))
        # if you don't want to replace the first line but to insert another line before
        # this simplifies to:
        #outfile.writelines(line for line in chain((first_line_update,), infile))
    copy(outfile.name, filename)

Upvotes: 0

moooeeeep
moooeeeep

Reputation: 32542

If you can afford having the entire file in memory at once:

first_line_update = "WRITE IN THE FIRST LINE \n"
with open(tableTempFile, 'r+') as f:
  lines = f.readlines()
  lines[0] = first_line_update
  f.writelines(lines)

otherwise:

from shutil import copy
from itertools import islice, chain 
# TODO: use a NamedTemporaryFile from the tempfile module
first_line_update = "WRITE IN THE FIRST LINE \n"
with open("inputfile", 'r') as infile, open("tmpfile", 'w+') as outfile:
  # replace the first line with the string provided:
  outfile.writelines(
    (line for line in chain((first_line_update,), islice(infile,1,None)))
  # if you don't want to replace the first line but to insert another line before
  # this simplifies to:
  #outfile.writelines(line for line in chain((first_line_update,), infile))
copy("tmpfile", "infile")
# TODO: remove temporary file

Upvotes: 3

codersofthedark
codersofthedark

Reputation: 9655

Please note, there's no way to do this with any built-in functions in Python.

You can do this easily in LINUX using tail / cat etc.

For doing it via Python we must use an auxiliary file and for doing this with very large files, I think this method is the possibility:

def add_line_at_start(filename,line_to_be_added):
    f = fileinput.input(filename,inplace=1)
    for xline in f:
        if f.isfirstline():
            print line_to_be_added.rstrip('\r\n') + '\n' + xline,
        else:
            print xline

NOTE:

  1. Never try to use read() / readlines() functions when you are dealing with big files. These methods tried load the complete file into your memory

  2. In your given code, seek function is going to take you the starting point but then everything you write would overwrite the current content

Upvotes: 4

Kos
Kos

Reputation: 72279

Generally, you can't do that. A file is a sequence of bytes, not a sequence of lines. This data model doesn't allow for insertions in arbitrary points - you can either replace a byte by another or append bytes at the end.

You can either:

  • Replace the first X bytes in the file. This could work for you if you can make sure that the first line's length will never vary.
  • Truncate the file, write the first line, then rewrite all the rest after it. If you can't fit all your file into the memory, then:
    • create a temporary file (the tempfile module will help you)
    • write your line to it
    • open your base file in r and copy its contents after the first line to the temporary file, piece-wise
    • close both files, then replace the input file by the temporary file

(Note that appending to the end of a file is much easier - all you need to do is open the file in the append a mode.)

Upvotes: 2

Related Questions