Nelly Brili
Nelly Brili

Reputation: 35

A single CSV column includes commas - Python

I have a CSV file, with 3 columns. But, one of these 3 columns include commas that break the CSV format. My csv is as below:

id,name,score
1,Black,1
2, Brown,J,0

I want to copy only the second column in a different CSV file. My code looks like below:

for row in inpTweets:
            total_score = 0
            name = row [1]
writer.writerow([row [1], total_score])

Is there any appropriate way to change this format, in order to choose all the name field using python?

Upvotes: 0

Views: 998

Answers (1)

Gord Thompson
Gord Thompson

Reputation: 123419

Because your source CSV file is malformed, you will get a different number of elements when the CSV reader splits the various lines. For example,

import csv
with open(r'C:\Users\Gord\Desktop\broken.csv', 'rb') as csv_in:
    inpTweets = csv.reader(csv_in, skipinitialspace=True)
    header_row = True
    for row in inpTweets:
        if header_row:
            header_row = False
        else:
            print(row)

will print

['1', 'Black', '1']
['2', 'Brown', 'J', '0']

Notice that the first list conains three (3) elements and the second list contains four (4) elements.

If we know that

  • the source file is supposed to contain only three columns, and
  • the first and last columns are "id" and "score"

then we can "glue" second column back together from the intermediate elements in the list, i.e.,

row[1] + ', ' + row[2] + ... + row[n-1]

That can be done with a list comprehension over range(1, len(row) - 1) ...

[row[x] for x in range(1, len(row) - 1)]

... which we can then pass to ', '.join() to "glue" the individual elements back into a string

', '.join([row[x] for x in range(1, len(row) - 1)])

The final code would look something like this:

import csv
with open(r'C:\Users\Gord\Desktop\broken.csv', 'rb') as csv_in:
    inpTweets = csv.reader(csv_in, skipinitialspace=True)
    with open(r'C:\Users\Gord\Desktop\output.csv', 'wb') as csv_out:
        writer = csv.writer(csv_out, quoting=csv.QUOTE_NONNUMERIC)
        header_row = True
        for row in inpTweets:
            if header_row:
                header_row = False
            else:
                out_row = [', '.join([row[x] for x in range(1, len(row) - 1)]), 0]
                writer.writerow(out_row)

and the resulting output CSV file would be

"Black",0
"Brown, J",0

Upvotes: 1

Related Questions