Reputation: 1641
(Ruby 2.5) I have a method that reads and parses a csv file that's being uploaded via Alchemy CMS
def process_csv(csv_file, current_user_id, original_filename)
lock_importer
errors = []
index = 0
string_converter = lambda { |field| field.strip }
total = CSV.foreach(csv_file, headers: true).count
csv_string = csv_file.read.encode!("UTF-8", "iso-8859-1", invalid: :replace)
CSV.parse(csv_string, headers: true, header_converters: :symbol, skip_blanks: true, converters: [string_converter] ) do |row|
# do other stuff
end
but when I try to upload a csv file that has a column (name) with a string that contains special characters then I receive the Invalid Byte Sequence in UTF-8
error. I'm trying to test the value N'öt Réal Stô'rë
.
I've tried a few solutions that I found on the web but no luck - any suggestions?
Upvotes: 0
Views: 1300
Reputation: 27885
It's unclear what your csv_file
is. I guess it is a File-object.
Sometimes I got csv from Excel as a UTF-16. So let's try an example:
I have a csv-file stored in UTF-16BE with the following content:
line;comment;UmlautÄ
1;Das ist UTF-16 BE;Ä
2;öüäÖÄÜ;Ä
If I execute the following code:
require 'csv'
def process_csv(csv_file)
csv_string = csv_file.read#.encode!("UTF-8", "iso-8859-1", invalid: :replace)
CSV.parse(csv_string, headers: true, skip_blanks: true, col_sep: ';') do |row|
p row # do other stuff
end
end
process_csv(File.open('example_utf16BE.txt'))
then I get also a Invalid byte sequence in UTF-8
-error.
If I use
process_csv(File.open('example_utf16BE.txt', 'rb', encoding: 'BOM|utf-16BE'))
then everything works.
So I guess, you get a File-object in a wron encoding and the code csv_file.read.encode!("UTF-8", "iso-8859-1", invalid: :replace)
is a code part to repair this problem.
What you can do:
Add to you code:
p csv_file
p csv_file.external_encoding
You should get
#<File:example_utf16BE.txt>
#<Encoding:UTF-16BE>
Now check, if the file (in my example: example_utf16BE.txt
has really the encoding of the 2nd line.
If not, try to adapt the File-object creation.
If this is not possible, then you can try to use csv_file.set_encoding 'utf-8'
to change the encoding before you read the content.
Upvotes: 0