SecT0uch
SecT0uch

Reputation: 71

Most efficient way to overwrite a specific row in a CSV file

Given the following csv file:

01;blue;brown;black
02;glass;rock;paper
03;pigeon;squirel;shark

My goal is to replace the (unique) line containing '02' in the 1st position.

I wrote this piece of code:

with open("csv", 'r+', newline='', encoding='utf-8') as csvfile, open('csvout', 'w', newline='', encoding='utf-8') as out:
    reader = csv.reader(csvfile, delimiter=';')
    writer = csv.writer(out, delimiter=';')
    for row in reader:
        if row[0] != '02':
            writer.writerow(row)
        else:
            writer.writerow(['02', 'A', 'B', 'C'])

But re-writing the whole CSV in an other doesn't seem to be the most efficient way to proceed, especially for large files:

  1. Once the match is found, we continue to read till the end.
  2. We have to re-write every line one by one.
  3. Writing a second file isn't very practical nor is storage efficient.

I wrote a second piece of code who seems to answer to these two problems :

with open("csv", 'r+', newline='', encoding='utf-8') as csvfile:
    content = csvfile.readlines()
    for index, row in enumerate(content):
        row = row.split(';')
        if row[2] == 'rock':
            tochange = index
            break
    content.pop(tochange)
    content.insert(tochange, '02;A;B;C\n')
    content = "".join(content)
    csvfile.seek(0)
    csvfile.truncate(0)     # Erase content
    csvfile.write(content)

Do you agree that the second solution is more efficient? Do you have any improvement, or better way to proceed?

The number of character in the line can vary.

I'm apparently obliged to read and rewrite everything, if I don't want to use padding. A possible solution would be a database-like solution, I will consider it for the future.

If I had to choose between those 2 solutions, which one would be the best performance-wise?

Upvotes: 0

Views: 714

Answers (1)

SecT0uch
SecT0uch

Reputation: 71

As the caracter in the line may vary, I either have to read/write the whole file or; as @tobias_k said, use seek() to come back to the begining of the line and:

  • If the line is shorter, write just the line and pad with spaces;
  • If same length, write just the line;
  • If it's longer re-write that line and the following.

I want to avoid using padding so I used time.perf_counter() to measure exec time of both codes, and the second solution appears to be (almost 2*) faster (CSV of 10 000 lines, match at the 6 000th).

One alternative would be to migrate to a relational database.

Upvotes: 1

Related Questions