Daniel Bonnell
Daniel Bonnell

Reputation: 4997

ArgumentError: invalid byte sequence in UTF-8 when creating CSV from TempFile

I have the following two lines of a code that take an uploaded CSV file from params and return a hash of Contact objects. The code works fine when I input a CSV with UTF-8 encoding. If I try to upload a CSV with another type of encoding though, it breaks. How can I adjust the code to detect the encoding of the uploaded file and convert to UTF-8?

CSV::Converters[:blank_to_nil] = lambda { |field| field && field.empty? ? nil : field }
csv = CSV.new(params[:file].tempfile.open, headers: true, header_converters: :symbol, converters: [:all, :blank_to_nil]).to_a.map {|row| row.to_hash }

This question is not a duplicate! I've seen numerous other questions on here revolving around the same encoding issue, but the specifics of those are different than my case. Specifically, I need a way convert the encoding of a TempFile generated from my params hash. Other solutions I've seen involve encoding String and File objects, as well as passing an encoding option to CSV.parse or CSV.open. I've tried those solutions already without success.

I've tried passing in an encoding option to CSV.new, like so:

csv = CSV.new(params[:file].tempfile.open, encoding: 'iso-8859-1:utf-8', headers: true, header_converters: :symbol, converters: [:all, :blank_to_nil]).to_a.map {|row| row.to_hash }

I've also tried this:

csv = CSV.new(params[:file].tempfile.open, encoding: 'iso-8859-1:utf-8', headers: true, header_converters: :symbol, converters: [:all, :blank_to_nil]).to_a.map {|row| row.to_hash }

I've tried adjusting my converter as well, like so:

CSV::Converters[:blank_to_nil] = lambda { |field| field && field.empty? ? nil : field.encode('utf-8') }

I'm looking for a programatic solution here that does not require the user to convert their CSV to the proper encoding.

Upvotes: 0

Views: 616

Answers (2)

rii
rii

Reputation: 1658

I've also had to deal with this problem and here is how I finally solved it.

  CSV.open(new_csv_file, 'w') do |csv_object|
    lines = File.open(uploaded_file).read
    lines.each_line do |line|
      csv_object << line.encode!("utf-8", "utf-8", invalid: :replace, undef: :replace, replace: '').parse_csv
    end
  end
  CSV.new(File.read(new_csv_file))

Basically go through every line, sanitize it and shove it into a new CSV file. Hope that leads you and other in the right direction.

Upvotes: 1

Aetherus
Aetherus

Reputation: 8898

You can use filemagic to detect the encoding of a file, although it's not 100% accurate. It bases on system's file command tool, so I'm not sure if it works on windows.

Upvotes: 0

Related Questions