Reputation: 145
I have .csv files with a specific (text) column whose cells occasionally will contain a double quotation mark ("). When converted to shapefiles in ArcMap these single double quotation marks cause an erroneous conversion. They must be 'escaped'.
I need a script to edit the .csv so that it:
My script:
import csv
with open(Source_CSV, 'r') as file1, open('OUTPUT2.csv','w') as file2:
reader = csv.reader(file1)
# Write column headers without quotes
headers = reader.next()
str1 = ''.join(headers)
writer = csv.writer(file2)
writer.writerow(headers)
# Write all other rows with quotes
writer = csv.writer(file2, quoting=csv.QUOTE_ALL)
for row in reader:
writer.writerow(row)
This script successfully accomplishes BOTH of the above tasks across ALL columns.
For example, this original .csv:
Column 1, Column 2, Column 3, Column 4
Fred, Flintstone, 5'10", black hair
Wilma, Flintstone, five feet seven inches, red hair
Barney, Rubble, 5 feet 2" inches, blond hair
Betty, Rubble, 5 foot 7, black hair
Becomes this:
Column 1, Column 2, Column 3, Column 4
"Fred"," Flintstone"," 5'10"""," black hair"
"Wilma"," Flintstone"," five feet seven inches"," red hair"
"Barney"," Rubble"," 5 feet 2"" inches"," blond hair"
"Betty"," Rubble"," 5 foot 7"," black hair"
But what if I want to accomplish this in the 3rd column ONLY (the one that actually has double quotation marks occasionally)?
In other words, how can I end up with this...?
Column 1, Column 2, Column 3, Column 4
Fred, Flintstone," 5'10""", black hair
Wilma, Flintstone," five feet seven inches", red hair
Barney, Rubble," 5 feet 2"" inches", blond hair
Betty, Rubble," 5 foot 7", black hair
Upvotes: 0
Views: 993
Reputation: 177481
Is it sufficient to only quote fields that have a double quote in them? If so the default behavior of the csv
module will work, although I added skipinitialspace=True
when parsing the input file so it won't treat the space after the commas as significant.
Also per the csv
module documentation, I've opened the files in binary mode.
import csv
with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
reader = csv.reader(file1,skipinitialspace=True)
writer = csv.writer(file2)
for row in reader:
writer.writerow(row)
Input:
Column 1, Column 2, Column 3, Column 4
Fred, Flintstone, 5'10", black hair
Wilma, Flintstone, five feet seven inches, red hair
Barney, Rubble, 5 feet 2" inches, blond hair
Betty, Rubble, 5 foot 7, black hair
Output:
Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,five feet seven inches,red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,5 foot 7,black hair
If you require quoting every row of column 3, then you can do so manually. I've set the csv
module to quote nothing, and set the quote character to an unprintable control character that shouldn't appear in the input:
import csv
with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
reader = csv.reader(file1,skipinitialspace=True)
writer = csv.writer(file2,quoting=csv.QUOTE_NONE,quotechar='\x01')
# Write column headers without quotes
headers = reader.next()
writer.writerow(headers)
# Write 3rd column with quotes
for row in reader:
row[2] = '"' + row[2].replace('"','""') + '"'
writer.writerow(row)
Output:
Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,"five feet seven inches",red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,"5 foot 7",black hair
Upvotes: 1
Reputation: 565
You can try this:
import csv
with open("file.csv", "rU") as fin:
words = fin.readlines()
with open("cleaned.csv", "w") as fout:
writer = csv.writer(fout, quoting=csv.QUOTE_ALL, quotechar = '"', doublequote = True)
for row in words:
row = row.replace("\n", "")
newrow = []
for word in row.split(","):
newrow.append(word.strip())
writer.writerow(newrow)
First with open attemps to read it as a simple text file to bypass the badly formatted csv file. And then we write it to a csv file normally.
Upvotes: 0