Dave Isaacs
Dave Isaacs

Reputation: 4539

Ruby CSV.parse very picky when encountering quotes

I am finding the CSV parsing in Ruby 1.9.3 to be remarkably fragile. So much so that I am wondering if I am doing something wrong

If I do the following in irb I get an error:

1.9.3-p125 :011 > require 'csv'
 => true
1.9.3-p125 :012 > a = 'one,two,three, "four, five",six'
 => "one,two,three, \"four, five\",six" 
1.9.3-p125 :013 > arr = CSV.parse(a)
CSV::MalformedCSVError: Illegal quoting in line 1.
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1887:in `each'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1849:in `loop'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1849:in `shift'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1791:in `each'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1805:in `to_a'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1805:in `read'
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/csv.rb:1379:in `parse'
    from (irb):13
    from /Users/disaacs/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in `<main>'

I've found that the problem is the extra space preceding the "four, five" value. If I remove the space, then it works.

1.9.3-p125 :010 > a = 'one,two,three,"four, five",six'
 => "one,two,three,\"four, five\",six" 
1.9.3-p125 :011 > arr = CSV.parse(a)
 => [["one", "two", "three", "four, five", "six"]]

Spaces in front of the other values does not cause a problem. The following parses just fine

one, two, three,"four, five", six

Is there some parse option I am missing that makes using quoted values so fragile?

Upvotes: 2

Views: 2736

Answers (1)

joelparkerhenderson
joelparkerhenderson

Reputation: 35453

This is correct behavior. It's not being fragile.

Your comma after "four" is ending the field, and the next field starts immediately with the space.

You can't validly put a quote in the middle of a field (without escaping it).

Upvotes: 3

Related Questions