How to Escape all Single Double Quotation Marks in Particular .csv Column with Python?

Question

Using Python 2.7.6
Need solution without use of Pandas Library

I have .csv files with a specific (text) column whose cells occasionally will contain a double quotation mark ("). When converted to shapefiles in ArcMap these single double quotation marks cause an erroneous conversion. They must be 'escaped'.

I need a script to edit the .csv so that it:

replaces all instances of " with "".
wraps each cell in double quotation marks.

My script:

import csv

with open(Source_CSV, 'r') as file1, open('OUTPUT2.csv','w') as file2:
    reader = csv.reader(file1)  

    # Write column headers without quotes
    headers = reader.next()
    str1 = ''.join(headers)
    writer = csv.writer(file2)
    writer.writerow(headers)

    # Write all other rows with quotes
    writer = csv.writer(file2, quoting=csv.QUOTE_ALL)
    for row in reader:
        writer.writerow(row)

This script successfully accomplishes BOTH of the above tasks across ALL columns.

For example, this original .csv:

Column 1, Column 2, Column 3, Column 4 
Fred, Flintstone, 5'10", black hair 
Wilma, Flintstone, five feet seven inches, red hair 
Barney, Rubble, 5 feet 2" inches, blond hair 
Betty, Rubble, 5 foot 7, black hair

Becomes this:

Column 1, Column 2, Column 3, Column 4
"Fred"," Flintstone"," 5'10"""," black hair"
"Wilma"," Flintstone"," five feet seven inches"," red hair"
"Barney"," Rubble"," 5 feet 2"" inches"," blond hair"
"Betty"," Rubble"," 5 foot 7"," black hair"

But what if I want to accomplish this in the 3rd column ONLY (the one that actually has double quotation marks occasionally)?

In other words, how can I end up with this...?

Column 1, Column 2, Column 3, Column 4
Fred, Flintstone," 5'10""", black hair
Wilma, Flintstone," five feet seven inches", red hair
Barney, Rubble," 5 feet 2"" inches", blond hair
Betty, Rubble," 5 foot 7", black hair

Mark Tolonen · Accepted Answer

Is it sufficient to only quote fields that have a double quote in them? If so the default behavior of the csv module will work, although I added skipinitialspace=True when parsing the input file so it won't treat the space after the commas as significant.

Also per the csv module documentation, I've opened the files in binary mode.

import csv

with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
    reader = csv.reader(file1,skipinitialspace=True)  
    writer = csv.writer(file2)

    for row in reader:
        writer.writerow(row)

Input:

Column 1, Column 2, Column 3, Column 4
Fred, Flintstone, 5'10", black hair
Wilma, Flintstone, five feet seven inches, red hair
Barney, Rubble, 5 feet 2" inches, blond hair
Betty, Rubble, 5 foot 7, black hair

Output:

Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,five feet seven inches,red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,5 foot 7,black hair

If you require quoting every row of column 3, then you can do so manually. I've set the csv module to quote nothing, and set the quote character to an unprintable control character that shouldn't appear in the input:

import csv

with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
    reader = csv.reader(file1,skipinitialspace=True)
    writer = csv.writer(file2,quoting=csv.QUOTE_NONE,quotechar='\x01')

    # Write column headers without quotes
    headers = reader.next()
    writer.writerow(headers)

    # Write 3rd column with quotes
    for row in reader:
        row[2] = '"' + row[2].replace('"','""') + '"'
        writer.writerow(row)

Output:

Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,"five feet seven inches",red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,"5 foot 7",black hair

How to Escape all Single Double Quotation Marks in Particular .csv Column with Python?

Answers (2)

Related Questions