Reputation: 5575
I have a data format, that appears similar to a csv file, however has vertical bars between character strings, but not between Boolean fields. For example:
|2000|,|code_no|,|first name, last name|,,,0,|word string|,0
|2000|,|code_no|,|full name|,,,0,|word string|,0
I'm not sure what format this is (it is saved as a txt file). What format is this, and how would i import into python?
For referece, I had been trying to use:
with open(csv_file, 'rb') as f:
r = unicodecsv.reader(f)
And then stripping out the | from the start and end of the fields. This works ok, with the exception of fields which have a comma in them (e.g. |first name, last name|
where the field gets split because of the comma.
Upvotes: 0
Views: 53
Reputation: 155477
It looks like the pipes are being used as quote characters, not delimiters. Have you tried initializing the reader
to use pipe ('|'
) as the quote character, and perhaps to use csv.QUOTE_NONNUMERIC
as the quoting rules?
csv.reader(f, quotechar='|', quoting=csv.QUOTE_NONNUMERIC)
Upvotes: 4