Nick Johnson
Nick Johnson

Reputation: 95

Replacing Delimiter In CSV Files with Python

I have a folder with several CSV files. These files all contain the box drawing double vertical and horizontal as the delimiter. I am trying to import all these files into python, change that delimiter to a pipe, and then save the new files into another location. The code I currently have runs without any errors but doesn't actually do anything. Any suggestions?

import os
import pandas as pd

directory = 'Y:/Data'
dirlist = os.listdir(directory)
file_dict = {}
x = 0

for filename in dirlist:
    if filename.endswith('.csv'):
        file_dict[x] = pd.read_csv(filename)
        column = file_dict[x].columns[0]
        file_dict[x] = file_dict[x][column].str.replace('╬', '|')
        file_dict[x].to_csv("python/file{}.csv".format(x))
        x += 1

Here is a picture of sample data:

enter image description here

Upvotes: 1

Views: 4594

Answers (2)

matt2000
matt2000

Reputation: 1073

with i as open(filename):
    with o as open(filename+'.new', 'w+):
        for line in i.readlines():
            o.write(line.replace('╬', '|'))

or, skip the python, and use sed from your terminal:

$ sed -i 's/╬/|/g' *.csv

Assuming the original delimiter doesn't appear in any escaped strings, this should be slightly faster than using the regular csv module. Panada seems to do some filesystem voodoo when reading CSVs, so I wouldn't be too surprised if it is just as fast. sed will almost certainly beat them both by far.

Upvotes: 1

codelessbugging
codelessbugging

Reputation: 2909

Instead of directly replacing occurrences with the new character (which may replace escaped occurrences of the character as well), we can just use built-in functionality in the csv library to read the file for us, and then write it again

import csv
with open('myfile.csv', newline='') as infile, open('outfile.csv', 'w', newline='') as outfile:
    reader = csv.reader(infile, delimiter='╬')
    writer = csv.writer(outfile, delimiter='|')
    for row in reader:
        writer.writerow(row)

Adapted from the docs

Upvotes: 1

Related Questions