Reputation: 13
I've been trying to learn file I/O in Python, but have come across some sort of memory leak that I can't solve for no apparent reason.
file = "D:\\babelStorage\\Testing"
x = 1000000
while (x > 0):
with open("".join([file, "\\", "junk", str(x), ".txt"]), "wt") as trash:
trash.write("garbage")
x = x - 1
The same issue seems to occur even when I explicitly use trash.close()
. What exactly am I doing wrong that's causing huge chunks of memory to accumulate?
None of the memory shows up as a process on task manager. If I run it long enough I can get 10GB which are... somewhere. Closing the python shell doesn't recover the memory, either, I have to reboot.
Upvotes: 1
Views: 469
Reputation: 41
I know this question was asked two years ago, but I wanted to give an answer in case someone who came across this article wanted an answer. There are two major things wrong with this code snippet:
1, How your loop is constructed
The way your loop is constructed eats up a ton of memory. You're updating the same variable x
1000000 times, and due to the way Python updates variables you're creating a new variable each time you update x
and assigning a new space in memory to it. A better way to construct this loop would be to use a for loop. Like this:
for x in range(0, 1000000):
*do code*
Basically, the above for loop uses a generator to keep track of x, and terminates once x reaches 1000000. Generators are objects in Python that iterate over sequences of data without storing each individual item in memory. This makes them extremely useful for going over large ranges of numbers without putting too big a burden on your memory. In this case, "range(0, 1000000)" is the generator in question: "range(x, y)" is a constructor in Python that creates a generator that goes from the integer x to the integer y. The above code snippet is telling Python to *do code*
for each item in the generator range(0, 1000000)
, which has 1000000 items: each integer from 0 to 1000000 not including 1000000. I highly suggest reading a little on generators in Python and the range() function, they're fundamental to know IMO.
2, What you're doing with each iteration of your loop
In the code you provided, you're telling python to run this line 1000000 times:
with open("".join([file, "\\", "junk", str(x), ".txt"]), "wt") as trash:
trash.write("garbage")
This code opens a file, writes a single line to it, and closes the file afterwards. Problem is, if you're doing this 1000000 times, you're also opening and closing the same file 1000000 times! Which doesn't make sense, since you should only need to open the file once and close it once: open it when you start writing to it, and close it when you're done writing to it.The fix to this is very simple: simply put the "with open" statement before your loop. Which would look something like this:
with open("".join([file, "\\", "junk", str(x), ".txt"]), "wt") as trash:
for i in range(0, 1000000):
trash.write("garbage")
This way, you're opening your file first, then going through your for loop instead of opening and closing your file with each iteration of your loop. Once you've done these two fixes, your code should run fine.
Upvotes: 2