Mark Ortega
Mark Ortega

Reputation: 51

How to Parse with Commas in CSV file in Ruby

I am parsing the CSV file with Ruby and am having trouble in that the delimiter is a comma my data contains commas.

In portions of the data that contain commas the data is surrounded by "" but I am not sure how to make CSV ignore commas that are contained within Quotations.

Example CSV Data (File.csv)

NCB 14591  BLK 13  LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR

Example Code:

require 'csv'
CSV.foreach("File.csv", encoding:'iso-8859-1:utf-8', :quote_char => "\x00").each do |x|
  puts x[1]
end

Current Output: " 84.07 FT OF 25

Expected Output: 84.07 FT OF 25, ALL OF 26,

Link to the gist to view the example file and code. https://gist.github.com/markscoin/0d6c2d346d70fd627203317c5fe3097c

Upvotes: 0

Views: 2452

Answers (2)

Simple Lime
Simple Lime

Reputation: 11035

The illegal quoting error is when a line has quotes, but they don't wrap the entire column, so for instance if you had a CSV that looks like:

NCB 14591  BLK 13  LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
NCB 14592  BLK 14  LOT W IRR,84.07 FT OF "25",TWENTY-FOUR SAC HOLDING COR

You could parse each line individually and change the quote character only for the lines that use bad quoting:

require 'csv'

def parse_file(file_name)
  File.foreach(file_name) do |line|
    parse_line(line) do |x|
      puts x.inspect
    end
  end
end

def parse_line(line)
  options = { encoding:'iso-8859-1:utf-8' }
  begin
    yield CSV.parse_line(line, options)
  rescue CSV::MalformedCSVError
    # this line is misusing quotes, change the quote character and try again
    options.merge! quote_char: "\x00"

    retry
  end
end

parse_file('./File.csv')

and running this gives you:

["NCB 14591  BLK 13  LOT W IRR", " 84.07 FT OF 25, ALL OF 26,", "TWENTY-THREE SAC HOLDING COR"]
["NCB 14592  BLK 14  LOT W IRR", "84.07 FT OF \"25\"", "TWENTY-FOUR SAC HOLDING COR"]

but then if you have a mix of bad quoting and good quoting in a single row this falls apart again. Ideally you just want to clean up the CSV to be valid.

Upvotes: 0

Marko Tunjic
Marko Tunjic

Reputation: 1879

Try with force_quotes option:

require 'csv'
CSV.foreach("data.csv", encoding:'iso-8859-1:utf-8', quote_char: '"', force_quotes: true).each do |x|
  puts x[1]
end

Result:

84.07 FT OF 25, ALL OF 26,

Upvotes: 2

Related Questions