dnatoli
dnatoli

Reputation: 7012

Split a CSV file while ignoring commas and return characters within quotes using Ruby

I want to split a CSV line into its separate fields, however in some of the fields there are commas or new line characters.
If I use line.split(',') it picks up the commas within the quotes, and if I use the CSV class it gives me an illegal format error because of the new lines.

Upvotes: 1

Views: 2737

Answers (3)

pbnelson
pbnelson

Reputation: 1749

FasterCSV has a handy parse_line() method that nicely replicates the functionality of .split(',') in creating an array while respecting rules for commas within double-quoted strings.

require 'csv'
CSV.parse_line(line)

Example...

require 'csv'
line='"PBN, Inc.",100,10'
puts(line.chomp.split(','))
  "PBN 
  Inc."
  100
  10
puts(CSV.parse_line(line))
  PBN, Inc.
  100
  10

Upvotes: 3

gertas
gertas

Reputation: 17145

I'm sure self-implementing it is reinventing the wheel. If stdlib's CSV class doesn't satisfy you try another implementation FasterCSV.

Make sure if your input format is all right: new lines and commas and escaped quotes within quotes only.

Update: According to Generating fields containing newline with Ruby CSV::Writer stdlib's CSV has problems with fields containing newlines. I suppose it first splits rows blindly using newline as separator and not taking any escapes into account.

Upvotes: 8

David Unric
David Unric

Reputation: 7719

You need to know the format of input csv file and it needs to be valid. If commas are not field separators, you have to specify what char is the separator then.

Processing of csv file may look like:

CSV.foreach(fname_in, {:col_sep => ';', :quote_char => '"',
                       :headers => true,
                       :encoding => Encoding::UTF_8}) do
    |row|
    ...
    # do some stuff with the row
    ...
end

As you may see, there are more options how to describe input format. See Ruby doc for CSV class of csv module.

Upvotes: 2

Related Questions