Reputation: 35
I have a CSV file, with 3 columns. But, one of these 3 columns include commas that break the CSV format. My csv is as below:
id,name,score
1,Black,1
2, Brown,J,0
I want to copy only the second column in a different CSV file. My code looks like below:
for row in inpTweets:
total_score = 0
name = row [1]
writer.writerow([row [1], total_score])
Is there any appropriate way to change this format, in order to choose all the name field using python?
Upvotes: 0
Views: 998
Reputation: 123419
Because your source CSV file is malformed, you will get a different number of elements when the CSV reader splits the various lines. For example,
import csv
with open(r'C:\Users\Gord\Desktop\broken.csv', 'rb') as csv_in:
inpTweets = csv.reader(csv_in, skipinitialspace=True)
header_row = True
for row in inpTweets:
if header_row:
header_row = False
else:
print(row)
will print
['1', 'Black', '1']
['2', 'Brown', 'J', '0']
Notice that the first list conains three (3) elements and the second list contains four (4) elements.
If we know that
then we can "glue" second column back together from the intermediate elements in the list, i.e.,
row[1] + ', ' + row[2] + ... + row[n-1]
That can be done with a list comprehension over range(1, len(row) - 1)
...
[row[x] for x in range(1, len(row) - 1)]
... which we can then pass to ', '.join()
to "glue" the individual elements back into a string
', '.join([row[x] for x in range(1, len(row) - 1)])
The final code would look something like this:
import csv
with open(r'C:\Users\Gord\Desktop\broken.csv', 'rb') as csv_in:
inpTweets = csv.reader(csv_in, skipinitialspace=True)
with open(r'C:\Users\Gord\Desktop\output.csv', 'wb') as csv_out:
writer = csv.writer(csv_out, quoting=csv.QUOTE_NONNUMERIC)
header_row = True
for row in inpTweets:
if header_row:
header_row = False
else:
out_row = [', '.join([row[x] for x in range(1, len(row) - 1)]), 0]
writer.writerow(out_row)
and the resulting output CSV file would be
"Black",0
"Brown, J",0
Upvotes: 1