F1rools22
F1rools22

Reputation: 117

Increasing the efficiency of appending data to a file

I am testing different speed of appending data to the end of a file.

The file is saved as a .txt. The contents of the file is a list of dicts.

Example dict: {'posted': ['2020-09-06T22:27:56.849149+00:00', '2020-09-06T22:27:56.849149+00:00'], 'seller_name': ['cheesetoken', 'cheesetoken'], 'seller_is_NPC': [False, False], 'Listings_sold': 2, 'quality': 2, 'price': 0.256, 'quantity_sold': 296554, 'datetime': datetime.datetime(2020, 9, 7, 0, 22, 27, 490902)} I'll shorted this to {data} for simplicities sake.

The file will continue to get larger with time, but currently is 1MB in size and will increase by approx 1MB every 14-21 days.

I want to append data to this list. The data I want to append will itself to a list. If I had [{data1},{data2},{data3},{data4}] save to disk already and I wanted to append [{data5},{data6}], I'd want to be able to easily read the data (It doesn't have to be saved like this) as [{data1},{data2},{data3},{data4},{data5},{data6}]

My original code to do this was:

    for x in formatted_sell_list:
        content = x.copy()
        file_name = str(db_number) + '- Q' + str(loop)
        if len(x) > 0:

            try:
                with open(path, str(file_name)) + '.txt', "r") as file1:
                    data = eval(file1.read())
                    file1.close()

            except:
                # print('Error no file to read: ' + str(db_file_name) + '.txt')
                data = []

            data = data + content

            with open(path, str(file_name)) + '.txt', "w") as file1:  # Overwriting
                file1.write(str(data))
                file1.close()

        loop = loop + 1

I felt this was probably quite an inefficient method of doing this, reading the entire file, evaling it, appending to the list and overwriting. I decided a line by line appending may work better, so I used this:

    for x in formatted_sell_list:
        content = x.copy()
        file_name = str(db_number) + '- Q' + str(loop) +' NEW'
        if len(x) > 0:

            for write_me in content:
                # Open the file in append & read mode ('a+')
                with open(path, str(file_name)) + '.txt', "a+") as file_object:

                    # Append text at the end of file
                    file_object.write(str(write_me))
                    file_object.write("\n")

        loop = loop + 1

I ran these alongside each other and timed how long each section of code took using time.time(). I found that in 100% of cases (File sizes between 1.3MB and 1KB) the old method was faster. On average it ran 4.5X times faster. Further testing showed that the most time intensive portion of the second piece of code was by far open the file.

Any suggestions to make this code faster/more efficient would be hugely appreciated.

Edited code:

for x in formatted_sell_list:
    # print('loop = ' + str(loop))
    content = x.copy()
    file_name = str(db_number) + '- Q' + str(loop) +' NEW'
    # print('Writing to ' + str(db_file_name) + ", " + str(content))
    if len(x) > 0:

        # Open the file in append & read mode ('a+')
        with open(os.path.join(r'C:\Users\PC\PycharmProjects\Simcompanies\Files\RecordedSales2',
                               str(file_name)) + '.txt', "a+") as file_object:

            for write_me in content:

                    # Append text at the end of file
                    file_object.write(str(write_me))
                    file_object.write("\n")

Upvotes: 0

Views: 3001

Answers (1)

smcjones
smcjones

Reputation: 5600

I/O operations are expensive.

Keep your writing to a minimum. Format your list into the string format you want and then perform one write operation.

Something like this:

with open(file) as fh:
    fh.write('\n'.join(map(str, content)) + '\n')

Upvotes: 1

Related Questions