A single CSV column includes commas - Python

Question

I have a CSV file, with 3 columns. But, one of these 3 columns include commas that break the CSV format. My csv is as below:

id,name,score
1,Black,1
2, Brown,J,0

I want to copy only the second column in a different CSV file. My code looks like below:

for row in inpTweets:
            total_score = 0
            name = row [1]
writer.writerow([row [1], total_score])

Is there any appropriate way to change this format, in order to choose all the name field using python?

Gord Thompson · Accepted Answer

Because your source CSV file is malformed, you will get a different number of elements when the CSV reader splits the various lines. For example,

import csv
with open(r'C:\Users\Gord\Desktop\broken.csv', 'rb') as csv_in:
    inpTweets = csv.reader(csv_in, skipinitialspace=True)
    header_row = True
    for row in inpTweets:
        if header_row:
            header_row = False
        else:
            print(row)

will print

['1', 'Black', '1']
['2', 'Brown', 'J', '0']

Notice that the first list conains three (3) elements and the second list contains four (4) elements.

If we know that

the source file is supposed to contain only three columns, and
the first and last columns are "id" and "score"

then we can "glue" second column back together from the intermediate elements in the list, i.e.,

row[1] + ', ' + row[2] + ... + row[n-1]

That can be done with a list comprehension over range(1, len(row) - 1) ...

[row[x] for x in range(1, len(row) - 1)]

... which we can then pass to ', '.join() to "glue" the individual elements back into a string

', '.join([row[x] for x in range(1, len(row) - 1)])

The final code would look something like this:

import csv
with open(r'C:\Users\Gord\Desktop\broken.csv', 'rb') as csv_in:
    inpTweets = csv.reader(csv_in, skipinitialspace=True)
    with open(r'C:\Users\Gord\Desktop\output.csv', 'wb') as csv_out:
        writer = csv.writer(csv_out, quoting=csv.QUOTE_NONNUMERIC)
        header_row = True
        for row in inpTweets:
            if header_row:
                header_row = False
            else:
                out_row = [', '.join([row[x] for x in range(1, len(row) - 1)]), 0]
                writer.writerow(out_row)

and the resulting output CSV file would be

"Black",0
"Brown, J",0

A single CSV column includes commas - Python

Answers (1)

Related Questions