Reputation: 109
I have a bunch of folders and sub-folders with CSVs that have quotation marks that I need to get rid of, so I'm trying to build a script that iterates through and performs the operation on all CSVs.
Below is the code I have.
It correctly identifies what is and is not a CSV. And it re-writes them all -- but it's writing blank data in -- and not the row data without the quotation marks.
I know that this is happening around lines 14-19 but I don't know know what to do.
import csv
import os
rootDir = '.'
for dirName, subDirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
# Check if it's a .csv first
if fname.endswith('.csv'):
input = csv.reader(open(fname, 'r'))
output = open(fname, 'w')
with output:
writer = csv.writer(output)
for row in input:
writer.writerow(row)
# Skip if not a .csv
else:
print 'Not a .csv!!'
Upvotes: 0
Views: 69
Reputation: 365657
The problem is here:
input = csv.reader(open(fname, 'r'))
output = open(fname, 'w')
As soon as you do that second open
in 'w'
mode, it erases the file. So, your input
is looping over an empty file.
One way to fix this is to you read the whole file into memory, and only then erase the whole file and rewrite it:
input = csv.reader(open(fname, 'r'))
contents = list(input)
output = open(fname, 'w')
with output:
writer = csv.writer(output)
for row in contents:
writer.writerow(row)
You can simplify this quite a bit:
with open(fname, 'r') as infile:
contents = list(csv.reader(infile))
with open(fname, 'w') as outfile:
csv.writer(outfile).writerows(contents)
Alternatively, you can write to a temporary file as you go, and then move the temporary file on top of the original file. This is a bit more complicated, but it has a major advantage—if you have an error (or someone turns off the computer) in the middle of writing, you still have the old file and can start over, instead of having 43% of the new file and all your data is lost:
dname = os.path.dirname(fname)
with open(fname, 'r') as infile, tempfile.NamedTemporaryFile('w', dir=dname, delete=False) as outfile:
writer = csv.writer(outfile)
for row in csv.reader(infile):
writer.writerow(row)
os.replace(outfile.name, fname)
If you're not using Python 3.3+, you don't have os.replace
. On Unix, you can just use os.rename
instead, but on Windows… it's a pain to get this right, and you probably want to look for a third-party library on PyPI. (I haven't used any of then, buy if you're using Windows XP/2003 or later and Python 2.6/3.2 or later, pyosreplace
looks promising.)
Upvotes: 1