Reputation: 191
I have a list with inconsistently placed quotations in Python, and I am trying to eliminate them. Something like the below works fine if there is only one set of double quotes per line in the csv file, but it gets thrown off if there are multiple sets (such as the fourth line (third line of data after the header)
I have tried a number of different methods, but I always seem to end up with the elements combined incorrectly.
Sample csv:
First,Nickname,Last,Sport
Bill,Bats,Smith,Baseball
Tom,Kicks,Johnson,Soccer
"John,"Footy",Jacobsen,Football"
Mike,"Mikey",Jones,Basketball
My Code:
import csv
with open('fake.csv', mode='r', encoding = 'utf-8') as infile:
reader = csv.reader(infile)
for line in reader:
if len(line) <4:
for i in range(0,len(line)):
line[i].strip('"')
line[i].replace('"', '')
print(line)
print(line[0] + line[2])
Desired output:
['First', 'Nickname', 'Last', 'Sport']
FirstLast
['Bill', 'Bats', 'Smith', 'Baseball']
BillSmith
['Tom', 'Kicks', 'Johnson', 'Soccer']
TomJohnson
['John','Footy', 'Jacobsen', 'Football']
JohnJacobsen
['Mike', 'Mikey', 'Jones', 'Basketball']
MikeJones
My Output:
['First', 'Nickname', 'Last', 'Sport']
FirstLast
['Bill', 'Bats', 'Smith', 'Baseball']
BillSmith
['Tom', 'Kicks', 'Johnson', 'Soccer']
TomJohnson
['John,Footy"', 'Jacobsen', 'Football"']
John,Footy"Football"
['Mike', 'Mikey', 'Jones', 'Basketball']
MikeJones
Any help would be appreciated
Upvotes: 0
Views: 558
Reputation: 37319
The reader will be expecting the quote characters to wrap entries that contain your delimiter, so it's working as expected. If your input contains unbalanced or inaccurate quoting, as in this example, one option is to tell the reader not to treat quotes specially at all:
reader = csv.reader(infile, quoting=csv.QUOTE_NONE)
You'd then have to process quotes yourself, so this is not the best choice if your input is consistently quoted.
Upvotes: 2