ju_ro
ju_ro

Reputation: 37

How to get past Excel CSV encoding nightmare: "\xEF" from ASCII-8BIT to UTF-8" using Ruby on Rails

I'm trying to parse CSV files in Rails, which works great except for anything saved in Excel (testing with Version 16.26) for both Windows and Mac (CSVs saved in Numbers & Google sheets work fine). Any character with an accent produces "Encoding::UndefinedConversionError: "\xEF" from ASCII-8BIT to UTF-8".

Excel claims it saves in UTF-8.

I want accented characters to not throw errors when I upload CSVs saved in Excel.

Things I've tried:

  1. setting the read encoding to bom|utf-8 (as per the BOM link), utf-8, iso-8859-1, utf-16, windows-1252, ascii-8bit (and cycling through each of these in an array incase one fails then dropping it out of the array)

  2. current code uses ISO8859-1:UTF-8 which is supposed to read in ISO8859-1 then encode in UTF-8

  3. Creating a tempfile, converting it to binmode, CSV.parse(temp.path, encoding: "bom|utf-8") per the first answer in this thread.

data = CSV.parse(csv, headers: true, header_converters: :symbol, skip_blanks: true, encoding: 'ISO8859-1:UTF-8')

It also works if I take a csv saved in Excel, then save it in google sheets or Numbers then upload it. Unfortunately, Excel is the most common CSV uploaded by our users.

Upvotes: 2

Views: 2441

Answers (1)

ju_ro
ju_ro

Reputation: 37

Solved by using csvreader gem. The built in CSV parser sucks in rails.

Upvotes: 0

Related Questions