Reputation: 626
I am dealing with Concordance loadfiles and have to edit them and thus I am using Python for that. The columns are delimited by the pilcrow char ¶
and have þ
as the quotechar.
The problem is the quotechar, the csv module in python only accepts a single-char quote (there is no issue when I write a csv file).
Question: how can I read a CSV file in Python where the quotechar is multi-character?
Example of the CSV fle:
þcol_1þ¶þcol_2þ¶þcol_3þ¶þcol_4þ
Upvotes: 1
Views: 1630
Reputation: 1124558
The Concordance file format is 8-bit encoded, and the ¶
and þ
characters are encoded in Latin-1, really. That means they are encoded to binary values 0xB6 and 0xFE, respectively.
The Python 2 csv
module accepts those bytes quite happily:
csv.reader(fileobj, delimiter='\xb6', quotechar='\xfe')
As usual for the csv
module, make sure to open the file in binary mode to leave newline handling to the module.
In Python 3, open the file in text mode with newline=''
and encoding='latin1'
, and either use the above \xhh
escapes or the actual characters, so delimiter='¶', quotechar='þ'
.
Upvotes: 3