Waterman
Waterman

Reputation: 145

Make CSV escape Double Quotation Marks

I need to prepare a .csv file so that a double quotation marks gets ignored by the program processing it (ArcMap). Arc was blending the contents of all following cells on that line into any previous one containing double quotation marks. For example:

enter image description here

...and no further rows would get processed at all.

How does one make a CSV escape Double Quotation Marks for successful processing in ArcMap (10.2)?

Upvotes: 0

Views: 1978

Answers (3)

Waterman
Waterman

Reputation: 145

What worked for me was writing a module to do some "pre-processing" of the CSV file as follows. The key line is where the "writer" has the parameter "quoting=csv.QUOTE_ALL". Hopefully this is useful to others.

def work(Source_CSV):
    from __main__ import *
    import csv, arcpy, os

    # Derive name and location for newly-formatted .csv file
    Head = os.path.split(Source_CSV)[0]
    Tail = os.path.split(Source_CSV)[1]
    name = Tail[:-4]
    new_folder = "formatted"
    new_path = os.path.join(Head,new_folder)
    Formatted_CSV = os.path.join(new_path,name+"_formatted.csv")
    #arcpy.AddMessage("Formatted_CSV = "+Formatted_CSV)

    # Populate the new .csv file with quotation marks around all field contents ("quoting=csv.QUOTE_ALL")
    with open(Source_CSV, 'rb') as file1, open(Formatted_CSV,'wb') as file2:

        # Instantiate the .csv reader
        reader = csv.reader(file1, skipinitialspace=True)   

        # Write column headers without quotes
        headers = reader.next()  # 'next' function actually begins at the first row of the .csv.  
        str1 = ''.join(headers)
        writer = csv.writer(file2)
        writer.writerow(headers)

        # Write all other rows wrapped in double quotes
        writer = csv.writer(file2, delimiter=',', quoting=csv.QUOTE_ALL)

        # Write all other rows, at first quoting none...
        #writer = csv.writer(file2, quoting=csv.QUOTE_NONE,quotechar='\x01')

        for row in reader:
            # ...then manually doubling double quotes and wrapping 3rd column in double quotes.
            #row[2] = '"' + row[2].replace('"','""') + '"'
            writer.writerow(row) 

        return Formatted_CSV

Upvotes: 0

Norrius
Norrius

Reputation: 7920

You could try reading the file with the csv module and writing it back in the hopes that the output format will be more digestible for your other tool. See the docs for formatting options.

import csv
with open('in.csv', 'r') as fin, open('out.csv', 'w') as fout:
    reader = csv.reader(fin, delimiter='\t')
    writer = csv.writer(fout, delimiter='\t')
    # alternative:
    # writer = csv.writer(fout, delimiter='\t', escapechar='\\', doublequote=False)
    for line in reader:
        writer.writerow(line)

Upvotes: 0

Atul Shanbhag
Atul Shanbhag

Reputation: 636

Let's say df is the DataFrame created for the csv files as follows

df = pd.read_csv('filename.csv')

Let us assume that comments is the name of the column where the issue occurs, i.e. you want to replace every double quotes (") with a null string ().

The following one-liner does that for you. It will replace every double quotes for every row in df['comments'] with null string.

df['comments'] = df['comments'].apply(lambda x: x.replace('"', ''))

The lambda captures every row in df['comments'] in variable x.

EDIT: To escape the double quotes you need to convert the string to it's raw format. Again another one-liner very similar to the one above.

df['comments'] = df['comments'].apply(lambda x: r'{0}'.format(x))

The r before the string is an escape to escape characters in python.

Upvotes: 1

Related Questions