Thierry Lam
Thierry Lam

Reputation: 46284

Does the Python "open" function save its content in memory or in a temp file?

For the following Python code:

fp = open('output.txt', 'wb')
# Very big file, writes a lot of lines, n is a very large number
for i in range(1, n):
    fp.write('something' * n)
fp.close()

The writing process above can last more than 30 min. Sometimes I get the error MemoryError. Is the content of the file before closing stored in memory or written in a temp file? If it's in a temporary file, what is its general location on a Linux OS?

Edit:

Added fp.write in a for loop

Upvotes: 8

Views: 4750

Answers (6)

jcdyer
jcdyer

Reputation: 19165

Building on ataylor's comment to the question:

You might want to nest your loop. Something like

for i in range(1,n):
    for each in range n:
        fp.write('something')
fp.close()

That way, the only thing that gets put into memory is the string "something", not "something" * n.

Upvotes: 3

Tendayi Mawushe
Tendayi Mawushe

Reputation: 26128

If you a writing out a large file for which the writes might fail you a better off flushing the file to disk yourself at regular intervals using fp.flush(). This way the file will be in a location of your choosing that you can easily get to rather than being at the mercy of the OS:

fp = open('output.txt', 'wb')
counter = 0
for line in many_lines:
    file.write(line)
    counter += 1
    if counter > 999:
        fp.flush()
fp.close()

This will flush the file to disk every 1000 lines.

Upvotes: 1

rob
rob

Reputation: 37644

File writing should never give a memory error; with all probability, you have some bug in another place.

If you have a loop, and a memory error, then I would look if you are "leaking" references to objects.
Something like:

def do_something(a, b = []):
    b.append(a)
    return b

fp = open('output.txt', 'wb') 

for i in range(1, n): 
    something = do_something(i)
    fp.write(something)

fp.close()

I am now picking just an example, but in your actual case the reference leak may be much more difficult to find; however this case will just leak memory inside do_something because of the way Python handles default parameters of functions.

Upvotes: 0

ghostdog74
ghostdog74

Reputation: 342609

If you write line by line, it should not be a problem. You should show the code of what you are doing before the write. For a start you can try to delete objects where not necessary, use fp.flush() etc..

Upvotes: 0

unwind
unwind

Reputation: 399949

There will be write buffering in the Linux kernel, but at (ir)regular intervals they will be flushed to disk. Running out of such buffer space should never cause an application-level memory error; the buffers should empty before that happens, pausing the application while doing so.

Upvotes: 3

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799044

It's stored in the operating system's disk cache in memory until it is flushed to disk, either implicitly due to timing or space issues, or explicitly via fp.flush().

Upvotes: 5

Related Questions