bream
bream

Reputation: 168

What causes my code to inflate text file size?

I've written a Python program to go through the text files in a directory and create new versions of each one with added line numbers. Here is the relevant function in the program:

def create_lined_ver(filename):
    new_text = []

    with open(filename + ".txt", "r+") as f:
        text = f.readlines()
        for (num, line) in enumerate(text):
            new_text.append("[{0}]: ".format(num) + line)

    with open(filename + "_lined" + ".txt", "a+") as f:
        for line in new_text:
            f.write(line)

To test it, I ran it on a batch of text files, and then, out of curiosity, ran it again (adding a second set line numbers to the already numbered files). I noticed that each time I ran the program, the file size of the newly created files were much larger than they should have been for adding ~5-6 characters per line. The file sizes were jumping from 150 KB (original) to 700, 1800, and then 3000 KB for each subsequent run.

What's causing the file sizes to increase so much?

Upvotes: 1

Views: 159

Answers (3)

OneCricketeer
OneCricketeer

Reputation: 191701

I don't think you need to be using lists or appending to files.

You're looking for something like this.

def create_lined_ver(filename):
    with open(filename + ".txt") as f_in, open(filename + " _lined.txt", "w") as f_out:
        for num, line in enumerate(f_in):
            f_out.write("[{}]: {}\n".format(num,  line))

Upvotes: 1

piiipmatz
piiipmatz

Reputation: 400

As pointed out, in the comments, you are appending to the lined version every time you run the code. Instead try:

def create_lined_ver(filename):

    with open(filename + ".txt", "r") as f:
        text = f.readlines()

    new_text = ["[{0}]: ".format(num) + line for (num, line) in enumerate(text)]

    with open(filename + "_lined" + ".txt", "w") as f:
        f.write(''.join([new_text]))

Upvotes: 1

Larry M
Larry M

Reputation: 21

In line #9, you open the file with the "a+" flag. This makes the file available for appending and reading. See here for a description of the different modes of the open command. By opening the file in "w" mode, you will write over the existing file.

Upvotes: 2

Related Questions