Reputation: 35
The following is the code, this code works fine and I get an output file with pipe as a delimiter. However, I do not want a new file to be generated rather I would like the existing file to be replaced with pipe delimiter instead of comma. Appreciate your inputs. I am new to python and learning it on the go.
with open(dst1,encoding='utf-8',errors='ignore') as input_file:
with open(dst2, 'w',encoding='utf-8',errors='ignore', newline='') as output_file:
reader = csv.DictReader(input_file, delimiter=',')
writer = csv.DictWriter(output_file, reader.fieldnames,'uft-8', delimiter='|')
writer.writeheader()
writer.writerows(reader)
Upvotes: 2
Views: 2019
Reputation: 1875
I'm not totally sure, but if the file is not too big, you can load the file in pandas using read_csv
& then save it using your desired delimiter using to_csv
function using whatever delimiter you like. For example -
import pandas as pd
data = pd.read_csv(input_file, encoding='utf-8')
data.to_csv(input_file, sep='|', encoding='utf-8')
Hope this helps!!
Upvotes: 0
Reputation: 106488
Since you are simply replacing a single-character delimiter from one to another, there will be no change in file size or positions of any characters not being replaced. As such, this is a perfect scenario to open the file in r+
mode to allow writing back the processed content to the very same file being read at the same time, so that no temporary file is ever needed:
with open(dst, encoding='utf-8', errors='ignore') as input_file, open(dst, 'r+', encoding='utf-8', errors='ignore', newline='') as output_file:
reader = csv.DictReader(input_file, delimiter=',')
writer = csv.DictWriter(output_file, reader.fieldnames, 'uft-8', delimiter='|')
writer.writeheader()
writer.writerows(reader)
EDIT: Please read @ShadowRanger's comment for limitations of this approach.
Upvotes: 0
Reputation: 155363
The only truly safe way to do this is to write to a new file, then atomically replace the old file with the new file. Any other solution risks data loss/corruption on power loss. The simple approach is to use the tempfile
module to make a temporary file in the same directory (so atomic replace will work):
import os.path
import tempfile
with open(dst1, encoding='utf-8', errors='ignore', newline='') as input_file, \
tempfile.NamedTemporaryFile(mode='w', encoding='utf-8', newline='',
dir=os.path.dirname(dst1), delete=False) as tf:
try:
reader = csv.DictReader(input_file)
writer = csv.DictWriter(tf, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
except:
# On error, remove temporary before reraising exception
os.remove(tf.name)
raise
else:
# else is optional, if you want to be extra careful that all
# data is synced to disk to reduce risk that metadata updates
# before data synced to disk:
tf.flush()
os.fsync(tf.fileno())
# Atomically replace original file with temporary now that with block exited and
# data fully written
try:
os.replace(tf.name, dst1)
except:
# On error, remove temporary before reraising exception
os.remove(tf.name)
raise
Upvotes: 2