Yaroslav
Yaroslav

Reputation: 2438

How can I read CSV with strange quoting in ruby?

I have CSV file with some line like:

col1,col "two",col3

so i get Illegal quoting error and fix that by setting :quote_char => "\x00"

["col1", "col\"two\"", "col3"]

but there is a line like

col1,col2,"col,3"

later in that file

["col1", "col2", "\"col", "3\""]

then i read file line by line and call parse_csv wrapped in block. Set :quote_char => "\"", rescue CSV::MalformedCSVError exceptions and for that particular lines set :quote_char => "\x00" and retry

All works perfectly until we get line

col1,col "two","col,3"

in this case it rescues from exception, set :quote_char => "\x00" and result is

["col1", "col\"two\"", "\"col", "3\""]

Apple Numbers is able to openn that file absolutely correctly.

Is there are any setting for parse_csv to handle this without preprocess string in some way?

UPD i show CSV lines as it is in file and results (arrays) as it was printed by p. there are no actual \" in my strings.

Upvotes: 3

Views: 236

Answers (2)

Danny_ds
Danny_ds

Reputation: 11406

This is an invalid csv file. If you have access to the source, you could (ask to) generate the data as follows:

col1,"col ""two""","col,3"

If not, the only option is to parse the data yourself:

pseudocode:

while(read_line) {

    bool InsideQuotes = false
    for each_char_in_line {

        if(char == doublequote)
            InsideQuotes = !InsideQuotes 

        if(char == ',' and !InsideQuotes)
            // separator found - process field
    }
}

This will also take care of escaped quotes like in col1,"col ""two""","col,3".

If the file contains multiline fields, some more work has to be done.

Upvotes: 1

tadman
tadman

Reputation: 211610

CSV is less a standard and more of a name that everyone thinks they're using to describe their quirky format correctly, and this is despite their being an RFC standard for CSV which is just another thing nobody pays attention to.

As such, a lot of programs that read CSV are very forgiving. Ruby's core CSV library is pretty good, but not as adaptable as others. That's because you've got Ruby there to get you out of a jam, and in Numbers you don't.

Try rewriting \" to "" which is conventional CSV formatting, as defined in the spec linked above:

CSV.parse(File.read.gsub(/\\"/, '""'))

Upvotes: 1

Related Questions