Reputation: 71
Given the following csv file:
01;blue;brown;black
02;glass;rock;paper
03;pigeon;squirel;shark
My goal is to replace the (unique) line containing '02' in the 1st position.
I wrote this piece of code:
with open("csv", 'r+', newline='', encoding='utf-8') as csvfile, open('csvout', 'w', newline='', encoding='utf-8') as out:
reader = csv.reader(csvfile, delimiter=';')
writer = csv.writer(out, delimiter=';')
for row in reader:
if row[0] != '02':
writer.writerow(row)
else:
writer.writerow(['02', 'A', 'B', 'C'])
But re-writing the whole CSV in an other doesn't seem to be the most efficient way to proceed, especially for large files:
I wrote a second piece of code who seems to answer to these two problems :
with open("csv", 'r+', newline='', encoding='utf-8') as csvfile:
content = csvfile.readlines()
for index, row in enumerate(content):
row = row.split(';')
if row[2] == 'rock':
tochange = index
break
content.pop(tochange)
content.insert(tochange, '02;A;B;C\n')
content = "".join(content)
csvfile.seek(0)
csvfile.truncate(0) # Erase content
csvfile.write(content)
Do you agree that the second solution is more efficient? Do you have any improvement, or better way to proceed?
The number of character in the line can vary.
I'm apparently obliged to read and rewrite everything, if I don't want to use padding. A possible solution would be a database-like solution, I will consider it for the future.
If I had to choose between those 2 solutions, which one would be the best performance-wise?
Upvotes: 0
Views: 714
Reputation: 71
As the caracter in the line may vary, I either have to read/write the whole file or; as @tobias_k said, use seek() to come back to the begining of the line and:
I want to avoid using padding so I used time.perf_counter() to measure exec time of both codes, and the second solution appears to be (almost 2*) faster (CSV of 10 000 lines, match at the 6 000th).
One alternative would be to migrate to a relational database.
Upvotes: 1