Waterman
Waterman

Reputation: 145

How to Escape all Single Double Quotation Marks in Particular .csv Column with Python?

I have .csv files with a specific (text) column whose cells occasionally will contain a double quotation mark ("). When converted to shapefiles in ArcMap these single double quotation marks cause an erroneous conversion. They must be 'escaped'.

I need a script to edit the .csv so that it:

  1. replaces all instances of " with "".
  2. wraps each cell in double quotation marks.

My script:

import csv

with open(Source_CSV, 'r') as file1, open('OUTPUT2.csv','w') as file2:
    reader = csv.reader(file1)  

    # Write column headers without quotes
    headers = reader.next()
    str1 = ''.join(headers)
    writer = csv.writer(file2)
    writer.writerow(headers)

    # Write all other rows with quotes
    writer = csv.writer(file2, quoting=csv.QUOTE_ALL)
    for row in reader:
        writer.writerow(row)

This script successfully accomplishes BOTH of the above tasks across ALL columns.

For example, this original .csv:

Column 1, Column 2, Column 3, Column 4 
Fred, Flintstone, 5'10", black hair 
Wilma, Flintstone, five feet seven inches, red hair 
Barney, Rubble, 5 feet 2" inches, blond hair 
Betty, Rubble, 5 foot 7, black hair

Becomes this:

Column 1, Column 2, Column 3, Column 4
"Fred"," Flintstone"," 5'10"""," black hair"
"Wilma"," Flintstone"," five feet seven inches"," red hair"
"Barney"," Rubble"," 5 feet 2"" inches"," blond hair"
"Betty"," Rubble"," 5 foot 7"," black hair"

But what if I want to accomplish this in the 3rd column ONLY (the one that actually has double quotation marks occasionally)?

In other words, how can I end up with this...?

Column 1, Column 2, Column 3, Column 4
Fred, Flintstone," 5'10""", black hair
Wilma, Flintstone," five feet seven inches", red hair
Barney, Rubble," 5 feet 2"" inches", blond hair
Betty, Rubble," 5 foot 7", black hair

Upvotes: 0

Views: 993

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 177481

Is it sufficient to only quote fields that have a double quote in them? If so the default behavior of the csv module will work, although I added skipinitialspace=True when parsing the input file so it won't treat the space after the commas as significant.

Also per the csv module documentation, I've opened the files in binary mode.

import csv

with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
    reader = csv.reader(file1,skipinitialspace=True)  
    writer = csv.writer(file2)

    for row in reader:
        writer.writerow(row)

Input:

Column 1, Column 2, Column 3, Column 4
Fred, Flintstone, 5'10", black hair
Wilma, Flintstone, five feet seven inches, red hair
Barney, Rubble, 5 feet 2" inches, blond hair
Betty, Rubble, 5 foot 7, black hair

Output:

Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,five feet seven inches,red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,5 foot 7,black hair

If you require quoting every row of column 3, then you can do so manually. I've set the csv module to quote nothing, and set the quote character to an unprintable control character that shouldn't appear in the input:

import csv

with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
    reader = csv.reader(file1,skipinitialspace=True)
    writer = csv.writer(file2,quoting=csv.QUOTE_NONE,quotechar='\x01')

    # Write column headers without quotes
    headers = reader.next()
    writer.writerow(headers)

    # Write 3rd column with quotes
    for row in reader:
        row[2] = '"' + row[2].replace('"','""') + '"'
        writer.writerow(row)

Output:

Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,"five feet seven inches",red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,"5 foot 7",black hair

Upvotes: 1

jason adams
jason adams

Reputation: 565

You can try this:

    import csv
with open("file.csv", "rU") as fin:
    words = fin.readlines()

with open("cleaned.csv", "w") as fout:
    writer = csv.writer(fout, quoting=csv.QUOTE_ALL, quotechar = '"', doublequote = True)
    for row in words:
        row = row.replace("\n", "")
        newrow = []
        for word in row.split(","): 
            newrow.append(word.strip())
        writer.writerow(newrow)

First with open attemps to read it as a simple text file to bypass the badly formatted csv file. And then we write it to a csv file normally.

Upvotes: 0

Related Questions