Reputation: 55
I have a file which contains a tab delimited header and line like so:
ID Field1
test1 "A","B"
Here's my parsing script.
with open(dataFile) as tsv:
for line in csv.reader(tsv, delimiter='\t'):
print(line)
And the output:
['ID', 'Field1']
['test1', 'A,"B"']
I can't figure out why it's stripping the double quotes on the first quoted item of the second field. I've tried different dialects and settings for csv reader with no success.
Upvotes: 2
Views: 1486
Reputation: 301
The default quote char for csv reader is double quote so it automatically removes them. Changing it to something like '|' will solve your problem. You can do it like this:
with open(dataFile) as tsv:
for line in csv.reader(tsv, delimiter='\t', quotechar='|'):
print(line)
From https://docs.python.org/3/library/csv.html#csv.Dialect.quotechar:
Dialect.quotechar
A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. It defaults to '"'.
EDIT:
Also you can use quoting=csv.QUOTE_NONE
option to disable quoting.
Upvotes: 3
Reputation: 123541
You just need to tell the csv.reader
to ignore quoting, via the csv.QUOTE_NONE
option:
with open(dataFile) as tsv:
for line in csv.reader(tsv, delimiter='\t', quoting=csv.QUOTE_NONE):
print(line)
Output:
['ID', 'Field1']
['test1', '"A","B"']
Upvotes: 2
Reputation: 436
It seems you are delimiting a tab and not actually splitting on the comma, I would change your code to reflect this.
Upvotes: 0