tan
tan

Reputation: 43

Commas and double-quotes in CSV files

I have a csv file with commas and double-quotes. Double quotes, however, are only used when there are commas inside the string. Like this:

AAA, BBB, "ACB,BCA", 123, "1,987"
DDC, CHA, ACH;HDC, "2,34", 192

I don't want to parse the commas in the double quotes and would like to have the following output:

x1    x2    x3       x4    x5
AAA   BBB   ACB,BCA  123   1,987
DDC   CHA   ACH;HDC  2,34  192 

In short, ignore the commas for values with double-quotes and consider the commas for values without double-quotes.

I am using this to merge multiple CSV files into a single CSV file, while adding some columns referring to the file name:

with open(outfile, "wb") as outfile:
    writer = None
    for input_filename in filenames:
        with open(input_filename, "rb") as infile:
            reader = csv.DictReader(infile, quotechar='"', delimiter=',')
            if writer is None:
                field_names = ["index1"] + ["index2"] + reader.fieldnames
                writer = csv.DictWriter(outfile, field_names)
                writer.writeheader()
            for row in reader:
                row["index1"] = input_filename[1:10]
                row["index2"] = input_filename[10:20]
                writer.writerow(row)

My specific solution to reading the CSVs is:

csv.DictReader(infile, quotechar='"', delimiter=',')

but that clearly does not work.

Any suggestions?

EDIT: a better example below:

the csv file I'm trying to open is structured like this:

x 1,x 2,x 3,x 4,x5 AAA, “BB,B”, CCC, DDD, EEE AA1, B;B2, CC3, DD4, EE5

The parsers I've tried so far, csv.DictReader, pd.read_csv or csv.reader seem to read the ";" in B;B2 as a row breaker which messes all the following columns.

The following code solves the "BB,B" problem, but still breaks B;B2 into a new row

csv.reader(fileObject, quotechar='"', delimiter=',',
             quoting=csv.QUOTE_ALL, skipinitialspace=True)

Upvotes: 2

Views: 12104

Answers (1)

Sagar Thatte
Sagar Thatte

Reputation: 75

The below code snippet still worked for me and did not break on the ';' symbol

csv.DictReader(fileObject, quotechar='"', delimiter=',',
                 quoting=csv.QUOTE_ALL, skipinitialspace=True)

Has worked for me.

This is the output I received:

{'x1': 'AAA', 'x2': 'BBB', 'x4': '123', 'x5': '1,987', 'x3': 'ACB,BCA'}
{'x1': 'DDC', 'x2': 'CHA', 'x4': '2,34', 'x5': '192', 'x3': 'ACH;HDC'}

Upvotes: 1

Related Questions