Reputation: 35

Replace comma with pipe delimiter within the same file using python

The following is the code, this code works fine and I get an output file with pipe as a delimiter. However, I do not want a new file to be generated rather I would like the existing file to be replaced with pipe delimiter instead of comma. Appreciate your inputs. I am new to python and learning it on the go.

with open(dst1,encoding='utf-8',errors='ignore') as input_file:
    with open(dst2, 'w',encoding='utf-8',errors='ignore', newline='') as output_file:
        reader = csv.DictReader(input_file, delimiter=',')
        writer = csv.DictWriter(output_file, reader.fieldnames,'uft-8', delimiter='|')
        writer.writeheader()
        writer.writerows(reader)

Upvotes: 2

Answers (3)

Aditya Mishra

Reputation: 1875

I'm not totally sure, but if the file is not too big, you can load the file in pandas using read_csv & then save it using your desired delimiter using to_csv function using whatever delimiter you like. For example -

import pandas as pd
data = pd.read_csv(input_file, encoding='utf-8')
data.to_csv(input_file, sep='|', encoding='utf-8')

Hope this helps!!

Upvotes: 0

blhsing

Reputation: 106488

Since you are simply replacing a single-character delimiter from one to another, there will be no change in file size or positions of any characters not being replaced. As such, this is a perfect scenario to open the file in r+ mode to allow writing back the processed content to the very same file being read at the same time, so that no temporary file is ever needed:

with open(dst, encoding='utf-8', errors='ignore') as input_file, open(dst, 'r+', encoding='utf-8', errors='ignore', newline='') as output_file:
    reader = csv.DictReader(input_file, delimiter=',')
    writer = csv.DictWriter(output_file, reader.fieldnames, 'uft-8', delimiter='|')
    writer.writeheader()
    writer.writerows(reader)

EDIT: Please read @ShadowRanger's comment for limitations of this approach.

Upvotes: 0

ShadowRanger

Reputation: 155363

The only truly safe way to do this is to write to a new file, then atomically replace the old file with the new file. Any other solution risks data loss/corruption on power loss. The simple approach is to use the tempfile module to make a temporary file in the same directory (so atomic replace will work):

import os.path
import tempfile

with open(dst1, encoding='utf-8', errors='ignore', newline='') as input_file, \
     tempfile.NamedTemporaryFile(mode='w', encoding='utf-8', newline='',
                                 dir=os.path.dirname(dst1), delete=False) as tf:
    try:
        reader = csv.DictReader(input_file)
        writer = csv.DictWriter(tf, reader.fieldnames, delimiter='|')
        writer.writeheader()
        writer.writerows(reader)
    except:
        # On error, remove temporary before reraising exception
        os.remove(tf.name)
        raise
    else:
        # else is optional, if you want to be extra careful that all
        # data is synced to disk to reduce risk that metadata updates
        # before data synced to disk:
        tf.flush()
        os.fsync(tf.fileno())

# Atomically replace original file with temporary now that with block exited and
# data fully written
try:
    os.replace(tf.name, dst1)
except:
    # On error, remove temporary before reraising exception
    os.remove(tf.name)
    raise

Upvotes: 2

Replace comma with pipe delimiter within the same file using python

Answers (3)

Related Questions