os.walk-ing through directory to read and write all the CSVs

Question

I have a bunch of folders and sub-folders with CSVs that have quotation marks that I need to get rid of, so I'm trying to build a script that iterates through and performs the operation on all CSVs.

Below is the code I have.

It correctly identifies what is and is not a CSV. And it re-writes them all -- but it's writing blank data in -- and not the row data without the quotation marks.

I know that this is happening around lines 14-19 but I don't know know what to do.

import csv
import os


rootDir = '.'

for dirName, subDirList, fileList in os.walk(rootDir):
    print('Found directory: %s' % dirName)
    for fname in fileList:

        # Check if it's a .csv first
        if fname.endswith('.csv'):

            input = csv.reader(open(fname, 'r'))
            output = open(fname, 'w')

            with output:
                writer = csv.writer(output)
                for row in input:
                    writer.writerow(row)

        # Skip if not a .csv
        else:
            print 'Not a .csv!!'

abarnert · Accepted Answer

The problem is here:

input = csv.reader(open(fname, 'r'))
output = open(fname, 'w')

As soon as you do that second open in 'w' mode, it erases the file. So, your input is looping over an empty file.

One way to fix this is to you read the whole file into memory, and only then erase the whole file and rewrite it:

input = csv.reader(open(fname, 'r'))
contents = list(input)
output = open(fname, 'w')
with output:
    writer = csv.writer(output)
    for row in contents:
        writer.writerow(row)

You can simplify this quite a bit:

with open(fname, 'r') as infile:
    contents = list(csv.reader(infile))
with open(fname, 'w') as outfile:
    csv.writer(outfile).writerows(contents)

Alternatively, you can write to a temporary file as you go, and then move the temporary file on top of the original file. This is a bit more complicated, but it has a major advantage—if you have an error (or someone turns off the computer) in the middle of writing, you still have the old file and can start over, instead of having 43% of the new file and all your data is lost:

dname = os.path.dirname(fname)
with open(fname, 'r') as infile, tempfile.NamedTemporaryFile('w', dir=dname, delete=False) as outfile:
    writer = csv.writer(outfile)
    for row in csv.reader(infile):
        writer.writerow(row)
os.replace(outfile.name, fname)

If you're not using Python 3.3+, you don't have os.replace. On Unix, you can just use os.rename instead, but on Windows… it's a pain to get this right, and you probably want to look for a third-party library on PyPI. (I haven't used any of then, buy if you're using Windows XP/2003 or later and Python 2.6/3.2 or later, pyosreplace looks promising.)

os.walk-ing through directory to read and write all the CSVs

Answers (1)

Related Questions