kikito
kikito

Reputation: 52651

FasterCSV: Check whether a file is invalid before accepting it - is there a simpler way?

I'm using FasterCSV on a Ruby on Rails application and currently it throws an Exception if the file is invalid.

I've looked over the FasterCSV doc, and it seems that if I use FasterCSV::parse with a block, it'll read the file one line at a time, without allocating too much memory. It'll throw a FasterCSV::MalformedCSV exception if there is any kind of error on the file.

I've implemented a custom solution, but I'm not sure it's the best possible one (see my answer below). I'd be interested in knowing alternatives

Upvotes: 3

Views: 3001

Answers (3)

kikito
kikito

Reputation: 52651

I made some tests yesterday and it turns out that my solution didn't quite work; I kept getting empty arrays on valid CSVs after implementing the first is_valid . I'm not sure whether it's a FasterCSV caching issue or something in my code, and I don't know if it's related with my test setup, but I decided to go implement a safe_parse instead:

#/lib/faster_csv_safe_parse.rb
class FasterCSV

  def self.safe_parse(file, options = {})
    begin
      FasterCSV.parse(file, options)
    rescue FasterCSV::MalformedCSVError
      nil
    end
  end

end

This will return a parsed array if the file is valid, or nil otherwise. I could then implement my validations as follows:

# /models/csv_importer.rb

class CsvImporter
  include ActiveRecord::Validations

  validates_presence_of :file
  validate check_file_format
  attr_accessor csv_data

  def csv_data
    @csv_data ||= FasterCSV.safe_parse(file)
  end

...

  private

  def check_file_format
    errors.add :file, "Malformed CSV! Please check syntax" if csv_data.nil?
  end
end

I guess it would be possible to implement a safe_parse that accepts a block and parses the file line by line, but for my purposes this simple implementation was enough, and it works in all cases.

Upvotes: 0

kikito
kikito

Reputation: 52651

This is my current solution. I'm really interested in knowing improvements / alternatives.

# /lib/fastercsv_is_valid.rb

class FasterCSV

  def self.is_valid?(file, options = {})
    begin
      FasterCSV.parse(file, options) { |row| }
      true
    rescue FasterCSV::MalformedCSV
      false
    end
  end

end

I use that method like this:

# /models/csv_importer.rb

class CsvImporter
  include ActiveRecord::Validations

  validates_presence_of :file
  validate check_file_format

...

  private

  def check_file_format
    errors.add :file, "Malformed CSV! Please check syntax" unless FasterCSV::is_valid? file
  end
end

Upvotes: 1

xinit
xinit

Reputation: 1916

I assume you want to parse the CSV and do something with the parsed results. Worst case is that your CSV is valid and that you parse the file again. I would write something like this to stash away the parsed result so you only have to parse the CSV once:

module FasterCSV

  def self.parse_and_validate(file, options = {})

    begin
      @parsed_result = FasterCSV.parse(file, options) { |row| }
    rescue FasterCSV::MalformedCSV
      @invalid = true
    end
  end

  def self.is_valid?
    !@invalid
  end    

  def self.parsed_result
    @parsed_result if self.valid?
  end

end

And then:

class CsvImporter
  include ActiveRecord::Validations

  validates_presence_of :file
  validate check_file_format

  # I assume you use the parsed result after the validations so in a before_save or something
  def do_your_parse_stuff
    here you would use FasterCSV::parsed_result
  end
...

  private

  def check_file_format
    FasterCSV::parse_and_validate(file)
    errors.add :file, "Malformed CSV! Please check syntax" unless FasterCSV::is_valid?
  end
end

In the above case, you might want to move stuff into a different class that takes care of communicating with FasterCSV and stashing away the parsed result, because I don't think my example is thread safe :)

Upvotes: -1

Related Questions