Reputation: 2438
I have CSV file with some line like:
col1,col "two",col3
so i get Illegal quoting
error and fix that by setting :quote_char => "\x00"
["col1", "col\"two\"", "col3"]
but there is a line like
col1,col2,"col,3"
later in that file
["col1", "col2", "\"col", "3\""]
then i read file line by line and call parse_csv
wrapped in block. Set :quote_char => "\""
, rescue
CSV::MalformedCSVError
exceptions and for that particular lines set :quote_char => "\x00"
and retry
All works perfectly until we get line
col1,col "two","col,3"
in this case it rescue
s from exception, set :quote_char => "\x00"
and result is
["col1", "col\"two\"", "\"col", "3\""]
Apple Numbers is able to openn that file absolutely correctly.
Is there are any setting for parse_csv
to handle this without preprocess string in some way?
UPD i show CSV lines as it is in file and results (arrays) as it was printed by p
. there are no actual \"
in my strings.
Upvotes: 3
Views: 236
Reputation: 11406
This is an invalid csv file. If you have access to the source, you could (ask to) generate the data as follows:
col1,"col ""two""","col,3"
If not, the only option is to parse the data yourself:
pseudocode:
while(read_line) {
bool InsideQuotes = false
for each_char_in_line {
if(char == doublequote)
InsideQuotes = !InsideQuotes
if(char == ',' and !InsideQuotes)
// separator found - process field
}
}
This will also take care of escaped quotes like in col1,"col ""two""","col,3"
.
If the file contains multiline fields, some more work has to be done.
Upvotes: 1
Reputation: 211610
CSV is less a standard and more of a name that everyone thinks they're using to describe their quirky format correctly, and this is despite their being an RFC standard for CSV which is just another thing nobody pays attention to.
As such, a lot of programs that read CSV are very forgiving. Ruby's core CSV library is pretty good, but not as adaptable as others. That's because you've got Ruby there to get you out of a jam, and in Numbers you don't.
Try rewriting \"
to ""
which is conventional CSV formatting, as defined in the spec linked above:
CSV.parse(File.read.gsub(/\\"/, '""'))
Upvotes: 1